Book | 
Generating Diversity 


= 
o 
a 
(e) 
o 
<= 
— 


University 


f 


S294 Cell biology 


Book | 
Generating Diversity 


The Open 


University 


This publication forms part of the Open University module $294 Cell biology, The complete list of texts which make up this module 
can be found at the back. Details of this and other Open University modules can be obtained from the Student Registration and Enquiry 
Service, The Open University, PO Box 197, Milton Keynes MK7 6BJ, United Kingdom (tel. +44 (0)845 300 60 90; email general- 
enquiries@open.ac.uk). 

Alternatively, you may visit the Open University website at www.open.ac.uk where you can leam more about the wide range of 
modules and packs offered at all levels by The Open University. 

To purchase a selection of Open University materials visit www.ouw.co.uk, or contact Open University Worldwide, Walton Hall, 
Milton Keynes MK7 6AA, United Kingdom for a catalogue (tel. +44 (0)1908 274066; fax +44 (0)1908 858787; email ouw-customer- 
services@open.ac.uk). 


MIX 


Paper from 


responsible sources 


FSC* C016278 


The Open University, Walton Hall, Milton Keynes MK7 6AA 

First published 2012. Second edition 2014 

Copyright © 2012, 2014 The Open University 

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, transmitted or utilised in any form or by 
any means, electronic, mechanical, photocopying, recording or otherwise, without written permission from the publisher or a licence 
from the Copyright Licensing Agency Ltd. Details of such licences (for reprographic reproduction) may be obtained from the Copyright 
Licensing Agency Ltd, Saffron House, 6-10 Kirby Street, London ECIN 8TS (website www.cla.co.uk). 

Open University materials may also be made available in electronic formats for use by students of the University. All rights, including 
copyright and related rights and database rights, in electronic materials and their contents are owned by or licensed to The Open 
University, or otherwise used by The Open University as permitted by applicable law. 

In using electronic materials and their contents you agree that your use will be solely for the purposes of following an Open University 
course of study or otherwise as licensed by The Open University or its assigns. 

Except as permitted above you undertake not to copy, store in any medium (including electronic storage or use in a website), distribute, 
transmit or retransmit, broadcast, modify or show in public such electronic materials in whole or in part without the prior written 
consent of The Open University or in accordance with the Copyright, Designs and Patents Act 1988. 

Edited and designed by The Open University. 

Typeset by The Open University. 

Printed and bound in the United Kingdom by Halstan & Co. Ltd, Amersham, Bucks. 

ISBN 978 1 7800 7880 9 

2.1 


Contents 


Introduction to $294 Cell biology 


Chapter | 


The first cells 


Carol Midgley 


Il 
1.2 
1.3 
1.4 
HES) 


Chapter 2 


Jill Saffrey 
Ball 
2.2 
2.3 
2.4 
oie 
2.6 


Chapter 3 


Jill Saffrey 
3.1 
3.2 
ER) 
3.4 
300) 
3.6 
“Se 


Chapter 4 


The origins of life on Earth 
Early cells 

Synthetic life? 

Final word 

Learning outcomes 


An introduction to cell diversity 


Introduction 

How cells are studied: microscopy and cell culture 
Prokaryotic cell diversity 

Eukaryotic cell diversity 

Final word 

Learning outcomes 


A tour of the cell 


Introduction 

How is subcellular organisation studied? 
The organisation of prokaryotic cells 
The organisation of eukaryotic cells 
Cell interactions in tissues 

Final word 

Learning outcomes 


Inheritance 


Robert Saunders 


4.1 
42 
43 
44 
45 


Introduction 

Genes and genomes 

Cell division, chromosome segregation and genetic variation 
Mendelian genetics 

Recombination frequencies and genetic maps 


102 
103 


105 


105 
105 
110 
116 
129 


4.6 
AF 
48 
49 
4.10 
4.11 


Sex-linked genes 

Human pedigree analysis 

Non-nuclear inheritance 

Discontinuous versus continuous variation 
Final word 

Learning outcomes 


Chapter 5 Genes and genomes 


Robert Saunders 


5.1 
5.2 
Lees 
5.4 
5.5 
5.6 
5.7 
5.8 
5.9 
5.10 
5.11 
5.12 
Sls 


Introduction 

DNA is the genetic material 

DNA replication 

DNA repair 

The flow of information: from DNA to protein synthesis 
The consequences of mutations 
Genes and genomes in prokaryotes 
Eukaryotic genes and chromosomes 
Viral genomes 

Genome sequencing and genomics 
Variation and evolution of genomes 
Final word 

Learning outcomes 


Chapter 6 The control of gene expression 


Carol Midgley 


6.1 
6.2 
6.3 
6.4 
65 
6.6 
6.7 
6.8 


Introduction 

An overview of gene expression 

Control of prokaryotic gene transcription 

Control of eukaryotic gene transcription 

Post-transcriptional control of gene expression in eukaryotes 
Translation 

Final word 

Learning outcomes 


Acknowledgements 


Module team 


Index 


Introduction to $294 Cell biology 


Introduction to S294 Cell biology 


The existence of cells was first reported more than 300 years ago. Since then, 
the study of cells has progressed rapidly as many different types have been 
described and analysed experimentally. Scientists now have a very detailed 
knowledge of the fundamental aspects of cells: their chemical composition and 
physical properties, the processes that go on inside them, the interactions 
between them, and the specialised functions that characterise distinct cell 
types. Despite this rapid progress, there is still much to learn about how cells 
function, and thousands of new scientific discoveries are published on this 
subject every year. 


Cell biology research has also played a crucial role in understanding human 
health and disease, and has the potential to solve some global problems by 
providing, for example, new sources of energy or food. 


= Identify a technological or medical use of cell biology that you have 
recently read about in a news report or magazine article, or heard about 
in a television or radio programme. 


© You may have identified one of the following: a treatment for cancers; 
vaccination against infectious disease; improvements in human fertility; 
the detection of congenital birth defects; new drugs derived from animal 
or plant cells; stem cell therapy to replace diseased or damaged tissues in 
the human body; forensic analysis in crime and archaeology. 


Biotechnology — the use of living cells and organisms and their cellular 
processes to manufacture products for health, industry and medicine — already 
generates annually many billions of dollars worldwide, and an understanding 
of basic cell biology is increasingly important in order to be able to make 
informed decisions about the scientific, economic, ethical and safety issues 
that surround the exploitation and manipulation of cells. There is no doubt 
that this is an exciting time to be involved in an area of science that has such 
enormous potential. 


Book 1 of this module begins with a brief discussion of some of the current 
theories about the origins and evolution of cells, before exploring the diversity 
of current-day cell types and their fundamental structure and organisation. The 
basis of cell diversity is the genetic material, deoxyribonucleic acid (DNA), 
and the role of DNA in determining the characteristics of cells, and their 
inheritance from one generation to another is also examined in this book. 


Despite the diversity in living organisms and in the cells of which they are 
composed, it is remarkable how many activities are common to all cells. 
Book 2 will explore how individual cells ‘work’: how they obtain energy and 
synthesise new components, how they communicate with each other, and how 
they move and interact with their environment. Book 3 will look at how cells 
grow and divide and respond to challenges such as extreme environments and 
disease. This final book will also discuss some of the modern technologies 
that exploit the properties of cells. 


* oe 


: peer | 7 


yqolold is; MES ef noltaubown 


JO mente cg Grey OF edt oven | et Gayle ove 7 
reset prval ger tatAS rain aa aime Reesor: tail mize = (ue hh 
tobhemels Gay eG es mawntee gihipemtins: Sengtaae beer fiebrand 

ium. erent ae illest eee ol ta ap tetervord 


Chapter | The first cells 


This first chapter begins with the origins of cells, and the fundamental 
characteristics which define a living cell or organism. Activity 1.1, which you 
will study at several points in this chapter, introduces some of these 
characteristics through the work of research scientists who study different 
aspects of cell structure and function. 


(LOs 1.1, 1.2 and 1.3) Allow 15 minutes (or 90 minutes for the 
complete video and associated notes) 

In this activity you will watch a video entitled Cell: The spark of life. The 
video provides a fascinating introduction to some aspects of cells and cell 
technologies that you will encounter as you study this module. You can 
choose to watch the entire video now (about | hour), or you may watch it in 
five separate parts at the points indicated as you read through Chapter 1. 


The video is accompanied by some notes and questions to aid your 
understanding. 


|.1 The origins of life on Earth 


All living things are composed of cells, and indeed the cell is the basic unit of 
life. Bacteria are a familiar example of unicellular organisms and they are the 
most abundant type of organism on Earth, both in terms of absolute numbers, 
and the number of different species. Examples are found in almost every 
environment. Multicellular organisms, including plants, animals and the 
larger fungi, consist of more than one cell. Most multicellular organisms have 
many different types of cells with specialised and distinct roles. An 
understanding of cells — their composition, organisation, structure, properties 
and behaviour — is fundamental to all biological sciences. 


This chapter considers how the first cells may have arisen and evolved, and in 
doing so identifies some of the key characteristics of cells — in other words, 
what constitutes a living cell. Later in this book you will learn about the 
cellular diversity which, ultimately, is the basis for the vast diversity of life on 
Earth. The place to start the story, however, is long before the appearance of 
living organisms, with an exploration of the chemical and physical events that 
could have made life possible. 


1.1.1 Before cells 


Present-day cells synthesise large, complex macromolecules, such as proteins 
and DNA, which are each composed of thousands of individual atoms. Many 
of these assemble into larger structures in the cell. It can be difficult to 
imagine how such a complex unit as the cell and its component 


Chapter | The first cells 


Note that viruses are not 
composed of cells, and are not 
generally considered to be living 
organisms. They are effectively 
parasites of living cells, and 
cannot proliferate independently. 


Generating Diversity 


Megaannum (Ma) means ‘million 
years’, A million is a thousand 
thousand, or 10°. 


Figure 1.1 Hydrogen bonding 
between three water molecules 
(red = oxygen atom; white = 
hydrogen atom). 


macromolecules could have arisen spontaneously, but the most widely 
accepted view is that life on Earth was made possible by the interaction of 
simple chemical components present in the Earth’s early environment. 


At the time when life first emerged, it is thought that conditions on Earth were 
very different from now. The Earth was formed from material that 
accumulated in the early Solar System, eventually forming a ball of molten 
rock around 4600 million years (Ma) ago. By about 4100 Ma ago, there was 
probably a solid rocky crust, and more volcanic activity than there is now. It 
is thought that nitrogen (N>) was present in the atmosphere, together with 
gases produced by volcanoes, including carbon dioxide (CO3), water vapour 
(HO), and probably smaller amounts of hydrogen (H2), hydrogen sulfide 
(HS) and carbon monoxide (CO); but very little free oxygen (O2). Because 
conditions were so hostile, it is likely that no life existed on Earth before 
3800 Ma ago, but there is some chemical evidence of microbes (microscopic 
organisms) in rocks that are 3400 Ma old, so these organisms must have 
emerged some time before that. 


The most abundant molecule in all present-day organisms is water (H2O), 
which contributes 60-80% of the mass of every cell. Water is essential for 
life. Each water molecule consists of two hydrogen (H) atoms linked to an 
oxygen (O) atom (Figure 1.1). The O-H bond between each hydrogen atom 
and the oxygen atom is a covalent bond, in which the two atoms share a pair 
of negatively charged electrons. The electrons in the O-H bond are, however, 
not shared equally. The oxygen atom attracts the negatively charged electrons 
towards itself more strongly than the hydrogen atom, with the result that the 
oxygen in a water molecule has weak negative charge, represented as delta 
minus (5-), while the hydrogens have weak positive charge (5+) (Figure 1,1). 
This makes water a polar molecule; that is, different parts of the molecule 
have opposite charge. 


= How does the polar nature of individual water molecules affect their 
interaction with each other? 


© The positively and negatively charged regions of different water 
molecules are attracted to each other, forming weak bonds between the 
molecules (Figure 1.1). 


The formation of these weak hydrogen bonds between water molecules is 
why water exists as a liquid over a wide range of temperatures. Liquid water 
is an efficient solvent with the ability to dissolve other substances, providing a 
medium in which the chemical reactions necessary for life can occur. 
Scientists therefore look for water wherever they search for life on other 
Earth-like planets. 


As well as water, current-day cells contain many other molecules, chemical 
compounds and ions (atoms or groups of covalently bonded atoms that are 
positively or negatively charged), but perhaps the most distinctive feature of 
cells is that their structure is largely composed of organic molecules, which 
contain the element carbon. Carbon can form strong covalent bonds by 
sharing electrons with up to four other atoms, and is the basis of many 
different types of small organic molecules. Cells are able to link together such 


small carbon-containing monomers to form long chains, or polymers, in a 
process called polymerisation. In current-day cells, polymerisation usually 
requires the activity of cellular proteins called enzymes, which you will 
encounter later in the module. 


Cellular macromolecules are made up of one or more polymer chains. The 
building blocks that are used to build proteins, carbohydrate polymers 
(polysaccharides), nucleic acids and lipids are, respectively, amino acids, 
monosaccharides, nucleotides, and fatty acids and glycerol (Table 1.1). 


Chapter | The first cells 


As well as carbon (C), organic 
molecules may contain hydrogen 
(H), oxygen (O), nitrogen (N), 
phosphorus (P) and sulfur (S); 
approximately 92% of the dry 
weight of a cell is contributed by 
these six elements. Some 
examples of organic molecules 
are shown in Figure 1.3. 


Logically, before large macromolecules existed, these organic building blocks 
must have become available to the first cells. What do we know about how 


and when this happened? 


Table 1.1 Summary of the macromolecular components of cells. 


Macromolecule Constituent molecules Functions 


proteins amino acids 


polysaccharides monosaccharides 


nucleic acids nucleotides 
(DNA and 
RNA) 


lipids (fats) fatty acids and 
glycerol 


Perform many functions: for example, as 
enzymes (which facilitate chemical 
reactions), structural proteins, contractile 
proteins, transporters and signal 
receptors, 

Glycoproteins (proteins with 
polysaccharides attached) on the cell 
surface have roles in cell adhesion and 
recognition, and lubrication 

(e.g. mucus). 


Provide energy and a means of storing 
energy. In plants, and fungi, they 
provide support to cell walls. 


Hold the genetic information necessary 
for protein synthesis and the inheritance 
of cell characteristics. 


Energy storage, signalling molecules and 
components of cell and organelle 
membranes. 


Laboratory experiments have demonstrated that the organic molecules that are 
the essential building blocks of large macromolecules can form by chemical 
reactions between small inorganic molecules (Box 1.1). Although these 
experiments did not precisely reproduce the conditions on Earth 3800 Ma ago, 
they clearly demonstrated that spontaneous synthesis of organic molecules 


from simple components was possible. 


Generating Diversity 


ae Activity 1.1 (continued) The spark of life (Part 2) 


oars (LO 1.1) Allow 25 minutes 
Return to Activity 1.1 and watch Part 2 of the video. 


Box 1.1 Chemical evolution of organic molecules 


In the 1920s, the Russian scientist Alexander Oparin and the British 
scientist J. B. S. Haldane separately proposed the theory that energy from 
the Sun’s ultraviolet (UV) radiation (which in our modern era is largely 
absorbed by the ozone layer in the Earth’s atmosphere), or from 
lightning, could have caused molecules of gas in the primordial 
atmosphere to react with water, forming a mixture of simple organic 
compounds. Haldane called this mixture the ‘prebiotic soup’. 


In 1953, a graduate student in the US called Stanley Miller was the first 
to produce some scientific evidence in support of this theory. Miller was 
a student of Harold Urey, who had proposed that the early atmosphere of 
the Earth was probably rich in the gases ammonia (NH3), methane (CH4) 
and hydrogen (H2). Miller constructed the experimental apparatus 
illustrated in Figure 1.2, in which a mixture of these gases was subjected 
to an electrical discharge to simulate the effects of lightning, while water 
was refluxed (kept continually boiling and condensing) in the closed 
circuit. Any new water-soluble molecules that were produced could be 
sampled during the course of the experiment. 


Miller ran his experiment for about a week, and subsequent analysis of 
the samples he collected showed that they contained significant amounts 
of various organic molecules. Among these were five types of amino 
acid (including glycine, alanine and aspartic acid), the monomers that 
cells use to make proteins. The structures of the simple inorganic 
molecules that were used in the experiments carried out by Miller and 
other researchers, and of some of the organic products, are represented in 
Figure 1.3. Miller’s experiment didn’t prove that these reactions actually 
occurred on early Earth, but it did demonstrate that simple chemical 
reactions can give rise to reactive intermediates such as hydrogen 
cyanide and formaldehyde which in turn can react to form amino acids 
and other organic building blocks. Further experiments have produced 
nucleic acid bases similar to those found in nucleic acids. 


In 2008, a group of scientists used more modern techniques to analyse 
the contents of 11 surviving vials from Miller’s experiments of the 
early 1950s. In some vials, where Miller had slightly altered the 
conditions of his experiment by including a jet of air and steam to 
simulate a volcanic eruption, the group found more organic molecules 
than Miller had originally recorded, including at least 22 types of amino 
acid. Perhaps volcanic regions may have become rich in organic 
molecules through similar chemical activity. 


Chapter | The first cells 


water containing 


organic compounds 


In 1969, fragments of a meteor 
that fell over Murchison, 
Australia were shown to 
contain amino acids and 
nucleic acid bases. 


Generating Diversity 


REACTANTS IN MILLER-UREY EXPERIMENTS. 


hydrogen ammonia 
Hp NH; 


J wel eneey 


EXAMPLES OF SOME OF THE INTERMEDIATE CHEMICAL PRODUCTS 


a ge a 


hydrogen cyanide —_ formaldehyde formic acid acetic acid 
HCN CH20 HCOOH CH3COOH 


energy 


EXAMPLES OF SOME OF THE AMINO ACID PRODUCTS 


alanine 
HNCH(CH3)COOH 


glycine aspartic acid 
HoNCH,COOH HgNCH(CH2COOH)COOH 


Once simple organic building blocks were available in sufficient quantities, 
the formation of polymers could have occurred, given a suitable environment. 
When heated in certain conditions, amino acids polymerise into short chains 
called peptides, which, with the addition of more amino acids, become longer 
polypeptides, which can fold and interact in complex ways to form functional 
proteins. Such reactions require the monomers to be present in very high 
concentrations. On the early Earth, polypeptides may have first formed where 
amino acids accumulated in water on the surface of rocks or clay particles that 
promoted the chemical reactions leading to polymerisation. In present-day 
cells, proteins are the ‘workhorses’ that perform a range of cell functions, 
from catalysis (speeding up the rate at which reactions occur) to 
communication and maintenance of cell structure. You will learn more about 
different proteins and what they do in the cell throughout the module. 


An essential feature of living cells is that they are capable of synthesising 
their own proteins, and they are able to pass the information for protein 
synthesis on to their ‘offspring’, the new cells formed when they divide. This 
property of passing on information may have represented the first real step 
towards the evolution of a living organism, and it involves a different type of 
macromolecule — nucleic acids. 


1.1.2 Molecules that contain hereditary information 


All present-day cells contain one or more molecules of the nucleic acid 
deoxyribonucleic acid (DNA) which encode(s) the information required for 
the cell to manufacture a particular set of proteins that determine the 
characteristics of that cell. DNA is also the hereditary material that is copied 
and passed on to the offspring when a cell divides to form two new ‘daughter 
cells’. In this way, the characteristics of a particular type of cell are passed on 
from one cell generation to the next. 


The entirety of an organism’s hereditary material encoded by its DNA is 
known as its genome. Each cell carries at least one complete copy of the 
genome, which includes many hundreds or thousands of genes; these are 
sections of the DNA, each of which encodes a gene product, usually a protein. 
The genome also includes a large amount of ‘non-coding’ DNA, some of 
which helps to regulate protein synthesis (Chapters 5 and 6). The genes are 
the heritable units that are passed on from one cell generation to the next. 
This continuity of information is critical for the precise and timely production 
of the proteins that are responsible for the characteristic structure of cells and 
the processes of life, including the ability to capture and utilise energy, to 
synthesise macromolecules from materials obtained from the surrounding 
environment, and indeed to replicate the genetic material itself and make new 
cells with the same characteristics as the parent cell. 


Chapter | The first cells 


Generating Diversity 


Nucleic acids are large macromolecules built from thousands of monomers 
called nucleotides (Table 1.1). The structure of DNA is, famously, a double 
helix of two polymer chains, or strands, composed of four different types of 
nucleotide that differ in the nature of the base group that they contain: adenine 
(A), thymine (T), guanine (G) or cytosine (C). It is the sequence in which 
these four nucleotides occur in the DNA strands that ultimately specifies the 
sequence of amino acids in the polypeptide encoded by a gene. In order to 
make a protein, the DNA sequence of the appropriate gene is first copied, or 
‘transcribed’, into a shorter, temporary ‘messenger’ molecule — another type of 
nucleic acid, called ribonucleic acid (RNA). The sequence of this messenger 
RNA (mRNA) is then ‘translated’ by a complex cell structure called a 
ribosome into a sequence of amino acids which are linked together to form a 
polypeptide (Figure 1.4a). You will learn more about the processes of 
transcription and translation in Chapter 6 of this book. 


POVNONIWINIWOW PNA 


transcription 
mRNA 
translation 
newly formed 
ribosome polypeptide 
molecule 


‘fi 
/ /p new strands 
WP 


(b) 


Figure 1.4 (a) The flow of information from genomic DNA to synthesis of the 
encoded mRNA (transcription) and its translation to the polypeptide product. 
(b) Copying, or replication, of the DNA strands. 


As well as encoding protein sequence, the double-stranded structure of DNA 
also provides a template for the synthesis of a copy of itself to pass on to new 


daughter cells (Figure 1.4b). You will learn more about the copying of DNA, 
known as DNA replication, in Chapter 4 of this book. 


Present-day living organisms all use DNA as the hereditary molecule. Note, 
however, that whilst DNA can act as a template for its own replication, in 
cells this process is highly complex and requires the action of several catalytic 
proteins (enzymes). The synthesis of these enzymes requires the information 
encoded by DNA, so this presents rather a conundrum. How was 
polymerisation of nucleotides catalysed in early cells that lacked protein 
enzymes? The discovery in the 1980s that in certain circumstances RNA 
molecules can have catalytic activity similar to that of an enzyme, and can 
catalyse the polymerisation of nucleotides, has led to the suggestion that the 
first life may have been based on self-replicating RNA molecules, not DNA. 
DNA may have gradually become established as the universal genetic material 
because it is a more chemically stable molecule than RNA. The remarkable 
properties of self-replicating nucleic acid molecules may have been the key 
component that opened up the way to rapid evolution and the diversification 
of organisms (Box 1.2). 


Activity 1.1 (continued) The spark of life (Part 3) 


(LOs 1.1 and 1.2) Allow 20 minutes 
Return to Activity 1.1 and watch Part 3 of the video. 


Box 1.2 Variation, natural selection and evolution 


When the genome of an organism is replicated and passed on to its 
offspring, some rearrangements and errors in the sequence of nucleotides 
may be incorporated into the new DNA molecules. This results in 
genetic variation; that is, the genome of the offspring is similar but not 
quite identical to that of the parent. You will learn more about genetic 
variation in Chapter 4 of this book. Genetic variation results in 
phenotypic variation between the offspring of an organism and between 
the offspring and the parent. A phenotype is an observable characteristic 
of an organism such as its morphology (shape and appearance), 
behaviour, or biochemical or physiological properties. So, the offspring 
of an organism will have slightly different characteristics to those of its 
siblings and parent. 


« Why is this type of phenotypic variation across a population of 
organisms essential to the continued survival of that population? 


It allows the population to adapt over time to new or changing 
environments — for example, different food sources or predators. 


In 1858, Charles Darwin and Alfred Russel Wallace independently 
proposed the theory of natural selection as the process that drives 
speciation — the appearance of distinct new biological species. A species 
is often defined as a population of organisms that have similar 


Chapter | The first cells 


There is no single definition of 
‘species’. Modern biologists often 
use characteristics such as 
similarity of genomes, 
morphology or ecological niche 
(the organism's way of life) to 
differentiate between species. 


Generating Diversity 


characteristics, and are able to breed with each other to produce fertile 
offspring. 


Darwin and Wallace were unaware of the existence of DNA and the 
mechanism of inheritance, but they reasoned that within a population, 
certain phenotypic variants with particular characteristics will perform 
better in their environment than other individuals, and will tend to 
produce more offspring. This is sometimes referred to as fitness, as in 
‘survival of the fittest’. The important point to note here is that the fittest 
individuals are those with a reproductive advantage; they produce more 
offspring than less fit individuals. Consequently those organisms that 
survive and reproduce most effectively are the ones that pass on their 
own favourable gene variants, known as alleles, to the succeeding 
generations. In contrast, an individual that performs poorly in the 
environment is less likely to generate a large number of offspring, and its 
alleles will tend to become less frequent in the population. 


Natural selection therefore acts on individual organisms to influence 
populations of species over time. This is the process of evolution; that 
is, the change over time in the inherited characteristics or traits found in 
populations of individuals. Alternatively, evolution has been defined 
more specifically as the change in frequency of particular alleles within 
the gene pool of a population from one generation to the next. Darwin’s 
classic theory was that natural selection, and evolution, will occur if 
three conditions are met: 


1 There is a struggle for existence. Within a species, more individuals 
are usually produced by reproduction than will survive within 
environmental constraints (e.g. food supply), so competition ensures 
that not all of the individuals will survive to produce offspring. 

2 There is variation between individuals within a species. Those 
individuals with advantageous characteristics (the fittest) have a 
greater probability of survival, and therefore of reproducing and 
passing on their alleles, in the struggle for existence. 

3. The characteristics of an individual are heritable. Advantageous 
characteristics that promote survival are inherited by the offspring, so 
individuals possessing those characteristics will become increasingly 
common in the population over successive generations because they 
are more likely to survive and reproduce than individuals not 
possessing those characteristics. 


Summary of Section |.| 

¢ The prevailing theory for the origin of life on Earth is that life’s building 
blocks — organic molecules — formed from chemical reactions between 
simpler inorganic molecules that existed on the early Earth. 

* Cells synthesise different types of macromolecules made up of organic 
building blocks, e.g. proteins are composed of amino acids. 


Chapter | The first cells 


¢ The cells of all living organisms use DNA as the genetic information to 
construct proteins. DNA is replicated and passed on to daughter cells as 
the hereditary information that specifies the characteristics of the cell. 


¢ The diversity of species has arisen as a result of natural selection, the 
process that favours individuals with characteristics that increase their 
chances of survival and reproduction, such that they pass on their 
favourable gene alleles to subsequent generations. Evolution is the change 
in frequency of inherited characteristics in a population over time. 


1.2 Early cells 


In addition to possessing genetic material that encodes their protein 
components, another universal feature of present-day cells is that their outer 
limits are defined by a cell membrane, also known as the plasma membrane, 
which is composed largely of phospholipids (membranes are discussed in 
detail in Book 2). Phospholipids are molecules with a hydrophilic head (which 
is attracted to water) and a hydrophobic tail (which is repelled by water), so in 
an aqueous environment phospholipids tend to form bilayers (Figure 1.5a) in 
which the hydrophobic tails face away from the surrounding water, while the 
hydrophilic heads face outwards in contact with the water on either side of the 
bilayer. Such bilayers spontaneously adopt a spherical shape, effectively 
enclosing a portion of the surrounding aqueous medium (Figure 1.5b). 


aqueous aqueous 
solution 


hydrophilic 
(polar) head 


hydrophobic 
(non-polar) tails 


hydrophilic 
(polar) head 


aqueous 
solution 


(a) (b) 


Figure 1.5 Bilayers formed by phospholipids in an aqueous environment. (a) The hydrophobic (‘water-hating’) tails 
of individual phospholipid molecules are separated from the aqueous medium while the hydrophilic (‘water-loving’) 
heads are in contact with the water molecules. (b) Spherical phospholipid bilayers (micelles) form spontaneously, 
creating a separate aqueous compartment. 


Because lipid bilayers present a barrier to the movement of substances, they 
effectively create a separate aqueous compartment, protected from the 
surrounding environment. On the early Earth, such structures may have 


Generating Diversity 


There are four classic laws of 
thermodynamics to which all 
matter in the universe is subject. 
They are identified as the Zeroth, 
First, Second and Third Laws. 
The First Law of 
Thermodynamics essentially 
states that energy can neither be 
created nor destroyed, only 
converted between different 
forms, for example chemical 
bond energy into heat energy or 
kinetic (movement) energy. 


14 


formed around replicating nucleic acids, proteins and other organic molecules, 
promoting further chemical and physical interactions between them and 
concentrating the products of these reactions inside the phospholipid envelope. 
Eventually this enclosed collection of molecules and processes would have 
become a replicating unit, capable of reproducing offspring with the same 
structure and components: a recognisable cell. 


Several mechanisms have evolved for controlling the movement of substances 
across the plasma membranes of cells, enabling them to regulate and optimise 
intracellular conditions (i.e. those inside the cell). Present-day plant and 
animal cells also contain internal (intracellular) phospholipid membranes that 
enclose subcellular compartments with their own characteristic contents and 
specialised functions. These subcellular compartments are known as 
organelles and include the nucleus and mitochondria, which are described in 
Section 1.2.4, and in more detail in Chapter 3 of this book. Critically, in terms 
of cellular evolution, the development of organelles allowed primitive cells to 
diversify in terms of utilisation of different energy sources, and increased the 
efficiency of energy capture and other processes. 


(LOs 1.1 and 1.4) Allow 10 minutes 
Return to Activity 1.1 and watch Part 4 of the video. 
—) i rE Se ee ae 


1.2.1 Energy and the biosynthesis of molecules 


In order both to support life and to reproduce themselves, cells must 
synthesise essential cell components. To do this, they need a source of raw 
materials that they can convert into macromolecules such as nucleic acids and 
proteins. To drive the chemical reactions necessary to synthesise new 
polymers, cells also need energy. 


Many current-day organisms are heterotrophs and need to ‘feed’ on existing 
organic macromolecules, such as proteins, carbohydrates and lipids, in the 
form of other organisms or organic matter in their environment. All cells carry 
out biochemical reactions that break down organic macromolecules (these are 
called catabolic reactions) and then use the monomers released to synthesise 
new polymers (although some polymers are synthesised from scratch using 
smaller components). Cells can also degrade the monomers further to release 
the energy stored in the chemical bonds. This energy is then stored in energy 
carrier molecules such as adenosine 5'-triphosphate (ATP) (Figure 1.6). ATP 
can move around the cell and provide the energy that drives energy-requiring 
biochemical reactions (called anabolic reactions) that construct molecules 
from smaller units. ATP also drives other energy-requiring activities, such as 
movement. You will learn more about the mechanisms cells use to generate 
ATP in Book 2. The entirety of the biochemical activity that maintains life 
and allows the cell to grow, reproduce and respond to its environment is 
referred to as the cell’s metabolism. 


Figure 1.6 The chemical structure of ATP. Energy is stored in the bonds 
connecting the two terminal phosphate groups to the rest of the molecule. 


Again, there are many theories about how these processes took place in early 
cells. If the earliest ideas described in Box 1.1 were correct, these primitive 
cells would have been heterotrophs; that is, they would have relied on an 
environment that was already rich in organic molecules such as amino acids, 
and simple carbohydrates and lipids, to provide a ready source of raw 
materials and energy, as described above. If the Earth’s atmosphere was 
anaerobic (lacking in oxygen) at that time, the first energy-generating 
mechanism may have been similar to present-day glycolysis, the catabolic 
pathway that converts the simple sugar glucose into smaller molecules, 
releasing energy in the form of ATP (glycolysis will be described in Book 2). 


Rapidly growing populations of heterotrophs would have quickly used up the 
available nutrients, and life may have been in danger of dying out, were it not 
for the appearance of autotrophs, which were able to synthesise complex 
organic molecules from scratch using inorganic molecules and energy from 
their environment. Autotrophs living deep in the oceans, or underground, 
could have obtained their energy from catabolic chemical reactions involving 
small (inorganic) molecules, such as hydrogen sulfide (H2S). 


= What is a source of environmental energy for many surface-living 
autotrophs? 


© Sunlight. 


The process by which organisms such as present-day plants synthesise organic 
molecules from atmospheric carbon dioxide (CO) using the energy from 
sunlight is called photosynthesis. As you will learn in Book 2, the reactions 
of photosynthesis ultimately produce free molecular oxygen (Oz). It is likely 
that early photosynthetic autotrophs caused a huge increase in the amount of 
oxygen in the Earth’s atmosphere, sometimes referred to as ‘The Great 
Oxygenation Event’ or ‘Oxygen Crisis’, which occurred about 2500 Ma ago. 
Oxygen is a highly reactive molecule and this change would have facilitated a 
much wider range of chemical reactions, including the development of a much 
more efficient mechanism of generating ATP called aerobic respiration. 


Chapter | The first cells 


Generating Diversity 


The overall chemical equation for aerobic respiration of glucose is: 


Ce6H}205 + 602 = 6CO, + 6H,O + energyas ATP 
glucose oxygen carbon dioxide water 


In fact, aerobic respiration actually involves a whole series of chemical 
reactions which together produce much more ATP from the breakdown of a 
sugar molecule than anaerobic glycolysis. Aerobic respiration is consequently 
the main mechanism used by most present-day cells to drive energy-requiring 
processes, and it is described in more detail in Book 2. 


1.2.2 Cell evolution 


It is uncertain at what point all of the accumulated processes that have been 
described here would have culminated in a recognisable living cell. 


= The video in Activity 1.1 (Part 2) states that all present-day life on Earth 
evolved from a single ancestral living cell. Do you think it is likely that 
life appeared only once? 


© It is impossible to know, but living cells may have emerged on many 
occasions as the result of the processes described in this chapter. 


However, the processes of natural selection and evolution would probably 
have ensured that most of these early forms of life are not represented among 
present-day living organisms. The similarities between living cells in the 
current era (Box 1.3) suggest instead that they may have developed from one 
or a very few successful ancestral lines. The classification of modern 
organisms on the basis of their evolutionary relationships (Box 1.4) usually 
assumes that they can all be traced back to a successful common ancestral cell 
(a last universal common ancestor, or LUCA, Box 1.3). This is not to say that 
LUCA was the first and only living cell to emerge; it is simply the earliest 
recognisable ancestor of present-day living organisms. 


Chapter | The first cells 


have resembled present-day bacteria (Section 1.2.3), with a simple 
membrane that separated it from the external environment and allowed 
some control over what went into and came out of the cell. The nature of 
LUCA’s energy source is uncertain, but if it existed before the Great 
Oxygenation Event it must have been an anaerobe, an organism that does 
not require oxygen for its metabolism. Whether LUCA’s genetic material 
was DNA or RNA is also unknown. 


The evolution of different metabolic pathways for using different sources of 
energy and raw materials would have allowed early cells to adapt to and 
colonise different environments, and eventually resulted in the many different 
species of organisms that exist today. All present-day organisms can, however, 
be classified into one of three distinct ‘domains’: the Bacteria, Archaea and 
Eukarya (Box 1.4). The classification of organisms (an area of science known 
as taxonomy) is informed by the study of evolutionary relationships between 
species (which is known as phylogeny) (Box 1.4). 


Box 1.4 Taxonomy and the tree of life 


For centuries, humans have attempted to understand the diversity of 
living organisms by developing various classification systems based on 
their physical characteristics. Modern classification increasingly reflects 
the evolutionary relationships between organisms (phylogeny) by also 
taking into account similarities between their genomes. The classification 
of many organisms has been, and still is, subject to change as scientific 
research informs ideas about evolutionary relationships. 


The lowest unit in the hierarchy of classification is the species (Box 1.2), 
and closely related species are grouped into genera (singular genus). 
Carolus Linnaeus, the 18th century Swedish naturalist, first popularised a 
simplified system of naming organisms, where each species name has 
two parts: the genus name and the species name. In print, these names 
are italicised. For example, modern humans are Homo sapiens, in which 
Homo is the genus and sapiens the species, Note that the names of 
organisms are often abbreviated, so the name of the common laboratory 
bacterium Escherichia coli is most usually written E. coli and Homo 
sapiens as H. sapiens. Many species also have a more familiar common 
name, such as human in the case of H. sapiens. 


All taxonomic systems have a hierarchy of increasingly larger groupings 
as the hierarchy is ascended. Figure 1.7 illustrates this hierarchy by 
considering a familiar animal, the domestic cat, Felis catus or F. catus. 
The domestic cat is one of several species belonging to the genus Felis 
of a larger grouping, the family Felidae. Cats are meat-eating carnivores 
(order Carnivora), and are warm-blooded, hairy animals that bear live 
young, and are therefore in the class Mammalia, All mammals have 
backbones and hence are grouped with other ‘bony’ animals such as 
fishes and reptiles into a major rank in the hierarchy: the phylum 
Chordata (the plural of phylum is phyla). This phylum is grouped with 


Generating Diversity 


all other animals into the kingdom Animalia. Again, kingdom is another 
large grouping with readily observable distinguishing features. Thus it is 
relatively easy to distinguish an animal from a plant (kingdom Plantae), 
as animals move and feed on other organisms, whereas plants are usually 
stationary and need light to grow. 


kingdom Animalia 


Figure 1.7 Diagram to show how the classification of Felis catus is 
grouped into successively higher levels of the hierarchical classification 
system. 


Evolutionary relationships between organisms are often illustrated in 
diagram form by a ‘tree of life’ or phylogenetic tree (Figure 1.8). The 
highest level of hierarchy in the phylogenetic tree is the domain, of 
which there are three: Bacteria, Archaea and Eukarya. All of the 
members of the domains Bacteria and Archaea are prokaryotes (their 
cell(s) are of a type called prokaryotic, described in the next section) 
and they are almost all unicellular. The domain Eukarya contains the 
predominantly multicellular kingdoms Animalia (animals), Plantae (land 
plants) and Fungi (e.g. moulds, yeasts), together with a rather loosely 
defined group called the protists (e.g. algae, amoebae). The protists don’t 
fit into the other kingdoms, and are predominantly unicellular (although 
some are multicellular). The Eukarya are all eukaryotes, composed of a 
type of cell described as eukaryotic (Section 1.2.4). 


On a ‘tree of life’ diagram (Figure 1.8), the closer together the branch 
points, the closer the relationships, and the more recent the evolutionary 
divergence. So, for example, animals and fungi are more closely related 
to each other than either kingdom is to the land plants. The most distant 
relationship is between the three domains, which diverged a very long 
time ago in evolutionary history. 


Chapter | The first cells 


EUKARYA 


LAST COMMON 
ANCESTOR 


ARCHAEA BACTERIA 


Figure 1.8 The ‘tree of life’. This illustrates how all life is related and can be grouped into three domains (in this 
case, based on similarities between the sequences of ribosomal RNA (rRNA) genes, which you will learn more about 
in Chapter 6). For simplicity, not all the branches are labelled. 


1.2.3. The prokaryotes 


As mentioned in Box 1.4, the Bacteria and Archaea are all prokaryotes. 
Prokaryotic cells, the Archaea in particular, have a deceptively simple 
structure, which belies the biochemical complexity and diversity of these two 
domains. The Bacteria do, however, show some structural diversity, for 
example some of the actinobacteria form multicellular filaments (Section 2.3). 
Some features of a typical prokaryotic bacterium are illustrated in Figure 1.9. 
The most distinctive feature of prokaryotes is that they have no nuclear 
membrane separating the nucleoid, the region containing their single circular 
DNA molecule, from the rest of the cell (the name prokaryote comes from the 
Greek pro meaning ‘before’, and karyon meaning ‘kernel’). The substance that 
fills the cell is called the cytoplasm, which comprises the intracellular fluid 


Generating Diversity 


20 


called cytosol and all the cell components within it, including a large number 
of ribosomes, the complex structures where the synthesis of proteins takes 
place (Section 1.1.2). The cytoplasm is enclosed by a cell membrane (or 
plasma membrane) which consists of a lipid bilayer (Figure 1.5) with proteins 
embedded in it. Outside the cell membrane, many bacteria have a rigid cell 
wall which provides structural support. Some bacteria also have another, outer 
membrane surrounded by a slimy polysaccharide capsule which helps protect 
the cell. The cell wall is permeable to most molecules, but the cell membrane 
provides a selective barrier between the inside of the cell and its external 
environment and controls the passage of molecules into and out of the cell 
(Book 2), 


Figure 1.9 Schematic diagram showing the general organisation of a typical 
bacterium. 


1.2.4 The eukaryotes 


Eukaryotic cells (Figure 1.10) have a more complex structure than those of 
prokaryotes. Like prokaryotic cells, they have cytoplasm bounded by a cell 
membrane, but unlike prokaryotic cells, their DNA is packaged into distinct 
structures called chromosomes which are enclosed within a large organelle 
called the nucleus, itself bounded by a nuclear membrane. In addition, the 
cytoplasm of eukaryotic cells contains several other types of specialised 
membrane-bound organelles. These include mitochondria, where the majority 
of the ATP (the universal form of chemical energy, Section 1.2.1) is made, 
and, in plants and algae, plastids, which are responsible for the synthesis and 
storage of food. For example, plant chloroplasts are a type of plastid where 
photosynthesis takes place (Chapter 3). 


How did membrane-bound organelles arise? The widely accepted 
endosymbiotic theory proposes that certain organelles, including 
mitochondria, originated from bacteria that were taken up inside early cells 


Chapter | The first cells 


and lived initially as symbionts (organisms that live together for mutual 

benefit). There is evidence for this theory in the many similarities between 

these organelles and present-day bacteria, including: 

¢ similarities between the composition of the membranes surrounding 
organelles and the plasma membranes of bacteria 

e the similarity in the way that organelles and bacteria duplicate themselves 
by dividing in two (binary fission; Chapter 4) 

¢ the fact that organelles contain DNA that is different to the cell’s DNA but 
very similar to bacterial DNA in organisation and (circular) shape. 

Once incorporated, the endosymbiotic organisms would have gradually lost 

many of their features, and down the generations both organisms would 

eventually have become incapable of surviving independently. 


ribosomes 


Figure 1.10 Schematic diagram showing the general organisation of a typical 
eukaryotic animal cell. 


1.2.5 Multicellularity 


The earliest life was probably unicellular, but the diversity of modern 
multicellular organisms, including plants, fungi, animals, and some types of 
algae and bacteria suggest that there were many separate transitions to 
multicellular life. The driving force was undoubtedly a survival advantage, 
including perhaps more effective ways of reproducing, avoiding predators, 
obtaining food, and dividing labour between cells. 


The route to multicellularity was probably different for different organisms 
and could have arisen either because daughter cells failed to separate after 
cell division, or through the colony-forming behaviour of individual cells. 


21 


Generating Diversity 


22 


Present-day bacteria, yeast and moulds are unicellular, but prefer a colonial 
mode of growth when a solid surface is available. Each colony of cells arises 
from a single cell which undergoes multiple rounds of cell division to form a 
pile of cells all of the same type, visible to the human eye and containing 
millions of individuals (Figure 1.11a). This behaviour can be mutually 
advantageous for feeding: for example, by allowing the colony collectively to 
produce sufficient digestive enzymes to break down molecules in the 
substance supporting the colony. 


An example of a different type of colony-forming behaviour is that of the 
soil-living eukaryotic cellular slime mould Dictyostelium discoideum which 
normally exists as a single-celled amoeba (Chapter 2), a type of protist that is 
able to change shape. However, in starvation conditions, the individual cells 
signal to each other to aggregate and form a motile multicellular ‘slug’ in 
which the cells undergo differentiation; that is, they start to differ from each 
other and become more specialised to form a fruiting body, a reproductive 
structure in which some individuals make a long stalk and others produce 
spores which are dispersed into the air (Figure 1.11b). Spores that fall in a 
more favourable environment will develop into new amoebae. 


The first recognisably multicellular organisms may have been a type of sponge 
(Figure 1.11c). Sponges are aquatic animals that evolved from single-celled 
protists, and evidence of them exists in rocks at least 640 Ma old. Their 
bodies consist of a network of canals and chambers connected to open pores 
on their surface (hence the name of their phylum, Porifera) and they feed by 
drawing in water and filtering out nutrient particles. Unlike more complex 
multicellular animals, sponges have no specialised organs, and if certain 
sponge species are pushed through a fine sieve to separate all their cells, then 
given time, the individual cells will start to re-aggregate, eventually forming 
complete new sponges. They do, however, have several types of specialised 
cells, including cells called choanocytes, or collar cells, which line the canals. 
Each choanocyte has a sticky, funnel-shaped collar and a flagellum. The 
flagella all beat back and forth, forcing water through the canals, and the 
sticky collars of the cells pick up tiny particles of food brought in with 

the water. 


Some of the sponge cells are fotipotent, meaning that they are able to change 
to become another type of cell as required. Totipotency is a feature also 
exhibited by human embryonic stem cells, prompting the current interest in 
developing stem cell therapy for patients with certain types of cell damage 
(Book 3). Most of the cells in a multicellular organism, in contrast, are 
destined to develop into one particular cell type; and once they are fully 
differentiated they are no longer able to change into another form of cell, or to 
divide. The mechanisms that control differentiation will be discussed in 

Book 3. 


In more complex multicellular organisms, including most modern animals and 
plants, cells of different types are organised into highly structured tissues 
(e.g. the animal gut, which you will learn more about in Chapter 2) in which 
cells have different specialised functions and depend on each other to fulfil all 
the functions needed to maintain the organism, Such a ‘true’ multicellular 


organism is recognisable as a discrete individual with the following 
characteristics that differentiate it from a cell colony: 


¢ A multicellular organism develops from a single cell by cell division, to 
form a single individual consisting of different cell types. 


¢ All of the cells in the developing organism therefore possess the same 
genetic information, so an individual cell’s type is determined by its 
pattern of gene expression; that is, different cell types use different 
subsets of genes to synthesise the proteins that confer their particular 
characteristics. This differential expression of genes drives the process of 
differentiation. 


¢ The cells interact and communicate in such a way that separating them 
causes injury to the organism. 


¢ The organism coordinates a programme of repair and replacement of cells 
throughout its life in order to maintain its overall integrity. 


Figure 1,11 Colony-forming behaviour and multicellularity. (a) Bacterial colonies 
growing on an agar plate. (b) Scanning electron micrograph (Chapter 3) showing 
the stages of Dictyostelium discoideum aggregation to form fruiting bodies in 
which the original amoebae form stalk cells and spores. (c) A tube sponge. 


Chapter | 


The first cells 


23 


Generating Diversity 


24 


Summary of Section |.2 


« Water is essential for life because of its chemical properties as a solvent 
and the propensity for phospholipids to form bilayers in an aqueous 
environment, which is a prerequisite for the formation of cell membranes. 


¢ All living organisms are made up of one or more cells, each bounded by a 
cell membrane, 


¢ All living organisms obtain energy from their environment, either through 
the breakdown (catabolism) of molecules, or as light energy for 
photosynthesis, and they convert it into chemical bond energy in energy- 
rich molecules such as ATP. The chemical energy is required to drive the 
synthesis (anabolism) of new molecules and for other energy-requiring 
activities such as movement. 


¢ All living organisms grow and reproduce themselves by cell division. 


¢ The cells of prokaryotes are distinguished by their relatively simple 
structure and lack of internal membrane-bound organelles. 


¢ The cells of eukaryotes contain membrane-bound organelles with specialist 
functions, e.g. the nucleus and mitochondria. 


¢ Complex multicellular organisms consist of many distinct cell types that 
develop from a single cell through the process of cell division and 
differentiation. In true multicellular organisms, cells interact and 
communicate in such a way that separating them will cause injury or death 
of the organism. 


1.3. Synthetic life? 


This chapter has identified some of the defining characteristics of living cells 
through a brief exploration of their early evolution. Of course, we cannot 
know for certain how living organisms first emerged, and no experiment has 
so far unequivocally created life in the laboratory. 


As the understanding of the processes of living cells has grown, scientists 
have developed techniques for manipulating the characteristics of cells and 
have even begun to take steps towards the goal of custom-building a cell with 
defined properties. In 2009, scientists at the J. Craig Venter Institute, 
Maryland, USA demonstrated that they could chemically synthesise, in a test 
tube, the complete DNA genome of a type of bacterium called Mycoplasma 
mycoides, then introduce it into a different species of bacterium called 
Mycoplasma capricolum, replacing the original genome. 


= Would you expect the resulting living organisms they obtained to have 
the characteristics of Mycoplasma mycoides or Mycoplasma capricolum? 


© They should have the characteristics of M. mycoides because the genetic 
material that directs the construction of the cell’s proteins and determines 
the characteristics of the cell would be from that organism. 


The ‘synthetic’ bacterium (nicknamed Synthia) was viable and able to grow 
and multiply. In theory, this work opens the way to synthesising a cell with 
completely new properties; but chemically synthesising a whole genome, even 


a small one, is an expensive and laborious task, compared with modifying an 
existing organism. 


George Church and his team at Harvard University have taken an alternative 
approach. At the time of writing (2012), they have assembled, in a test tube, 
one key component of living cells, the ribosome. 


= What is the function of the ribosome? 


The ribosome is the cell component that translates the information 
encoded in the genome to synthesise new protein molecules (Figure 1.4). 


These researchers isolated the large number of individual protein and RNA 
molecules required to construct functional ribosomes in the bacterium 
Escherichia coli and mixed them together to form synthetic ribosomes that 
were able to synthesise new proteins. Their goal is eventually to create a 
ribosome that can replicate itself and act as a protein factory in a test tube. 
They have identified 151 genes that they think are sufficient to encode a self- 
reproducing ribosome, including genes for ribosomal proteins and RNAs, 
enzymes that catalyse different reactions in protein synthesis, and some 
additional genes not directly related to the ribosome. ‘We think this is enough 
genes to replicate DNA, produce RNA and ribosomes,’ says Church. ‘Once 
you get it going, it should be able to keep going if you supply it with amino 
acids and nucleotides’. 


(LOs 1.2, 1.3 and 1.4) Allow 20 minutes 
Return to Activity 1.1 and watch Part 5 of the video. 
0 TE EE eae ea Fis! 


1.4 Final word 


This chapter has considered some of the essential characteristics of living cells 
and some theories of how cells may have arisen on the early Earth. Clearly, 
many questions remain about how cells evolved. In present-day cells, DNA is 
the universal hereditary molecule that directs the production of proteins, but 
the replication of DNA is itself a highly complex process, requiring proteins 
to carry it out. To synthesise macromolecules such as DNA and proteins, cells 
need to obtain and transform raw materials and energy. In multicellular 
organisms, different types of cells carry out different specialised processes and 
they must communicate in order to survive. How these challenges were met in 
primitive cells and organisms is largely unknown. Although a great deal has 
been discovered about many of these processes and properties in present-day 
cells, we still have a lot to learn about how they combine to produce living 
cells and tissues. The next two chapters consider cells in more detail and look 
at some of the structural and biochemical differences that underlie their 

great diversity. 


Chapter | The first cells 


25 


Generating Diversity 


26 


1.5 Learning outcomes 


1.1 Describe some of the ideas and theories about how organic molecules and 
cells were first formed on Earth. 


1.2 Describe how hereditary information is transferred from one cell 
generation to the next. 


1,3. Explain the general role of cellular metabolism. 


1.4 Describe the general features of eukaryotic and prokaryotic cells, 


Chapter 2 An introduction to cell diversity 


Chapter 2 An introduction to cell 
diversity 


2.1 Introduction 


You have already learnt that all cells are composed of the same kinds of 
molecular building blocks and share some common features. Despite these 
common features, cellular diversity is enormous, both between different types 
of organism and within individual multicellular organisms. 


= From your study of Chapter 1, what are some of the common properties 
of cells? 


fa 


Some common properties of cells are that they: 


¢ use the same kinds of carbon-based macromolecules as basic components 
(proteins, lipids, carbohydrates and nucleic acids) 


e use DNA as their genetic material, which they ‘decode’ to make proteins 
¢ are enclosed by a membrane 
* require a constant supply of energy. 


You may have included another attribute of living organisms, the ability to 
grow and reproduce, but note that within adult multicellular organisms some 
of the individual specialised cells have lost their ability to divide; for example, 
most mature nerve cells (also known as neurons) are unable to divide. 


In Chapter 1, you also learnt something about the main features of ‘typical’ 
prokaryotic and eukaryotic cells. 


= What are the main differences between prokaryotic and eukaryotic cells? 


Eukaryotic cells have a number of specialised organelles each enclosed 
by its own intracellular membrane. In eukaryotic cells the DNA is 
separated from the cell cytoplasm because it is enclosed by a nuclear 
membrane, forming a large organelle called the nucleus. Prokaryotic cells 
have no membrane-bound organelles, and their DNA is not separated 
from the cell cytoplasm. 


= Aside from the nucleus, name two other cell organelles in eukaryotes, 
and state their main functions. 


© You may have thought of mitochondria and chloroplasts. The 
mitochondria generate most of the cell’s supply of ATP (Section 1.2.1). 
Chloroplasts are the sites where energy from light is used to convert 
carbon dioxide into organic compounds in the cells of photosynthetic 
eukaryotes (Section 1.2.4). 


Eukaryotic cells contain a number of other organelles, which you will learn 
about in Chapter 3. 


When considering cells, both prokaryotic and eukaryotic, it is important to 
consider their environment. Few cells exist as isolated entities; most are part 
of a cellular community, which therefore forms an important aspect of a cell’s 


ey, 


Generating Diversity 


28 


environment. As described in Chapter 1, the nature of cell communities varies. 
Single-celled organisms, for example bacteria, yeasts and some protists, 
frequently live in colonies. A widely used definition of a colony is that it is a 
group of individual organisms that are linked together either by living 
extensions of their bodies (e.g. cytoplasmic strands) or by non-living material 
that they have secreted. This definition is rather imprecise, but implies that 
although there may be communication between members of a colony, there is 
little or no difference between them; individual cells in a colony are 
functionally equivalent and could survive and form a new colony if separated 
from other colony members. 


Some simple organisms are considered to be multicellular; that is, unlike 
colonies, they are composed of different types of cells. Examples include 
animals such as sponges, which are essentially aggregates of a small number 
of different types of cells (Section 1.2.5). In some cases, however, it can be 
difficult to be certain if an organism should be classified as colonial or 
multicellular. You will encounter some more examples of this distinction later 
in the chapter. 


The different types of cell in more complex multicellular organisms are 
specialised to perform particular functions, such as movement, photosynthesis 
or secretion. Different molecules, particularly but not exclusively proteins, 
play an important role in these specialised functions. In plants, for example, 
some cells have the molecular apparatus that allows them to carry out 
photosynthesis. In animals, muscle cells synthesise specific proteins that 
enable them to contract, while non-contractile cells, such as skin cells, do not 
synthesise these proteins. The differential expression of proteins is therefore 
fundamental to the characteristic properties of specialised cells, as you will 
discover throughout this module. 


In addition to differences in the biochemical properties of the various cell 
types in multicellular organisms, the shapes of different cell types also vary. 
In humans, for example, red blood cells are small and disc-shaped, whereas 
nerve cells (neurons) have long processes, called axons, some of which extend 
very long distances, for example from the spinal cord to the muscles of the 
toes. The structure, or form (i.e. the shape and appearance) of cells is known 
as cell morphology, and plays an important role in cell function, as you 

will see. 


= What is the name of the process by which cells become specialised? 


The process by which cells become specialised is known as 
differentiation (Section 1.2.5). 


Differentiation is a complex but fascinating process, which continues to be the 
subject of intense research. You will learn more about it in Chapter 1 of 
Book 3. 


In the complex multicellular eukaryotes, different cell types tend to be 
organised into distinct groups or ‘tissues’ according to their function. The 
most complex organisms have evolved highly organised arrangements of 
different types of cells and tissues into organs and organ systems that perform 
specific functions. Examples include the vascular system (a system of vessels 


Chapter 2 An introduction to cell diversity 


for transporting fluids) of plants, and the digestive system of animals. 
Different tissues and organ systems are not described in any detail in this 
module, which instead focuses on just a few examples to illustrate the 
organisation and diversity of cells. 


A final point to note in this introductory section is that, apart from dormant 
cells such as those found in seeds and spores (agents of dispersal, typically 
associated with reproduction), all living cells are continually active. In 
addition to the obvious examples of physical activity exhibited by muscle 
cells and by motile cells such as sperm cells, at the molecular level all cells 
are highly dynamic. They continuously take up nutrients from their 
environment and use these as a source of energy and raw materials for 
synthesising new molecules; they transport molecules to different locations 
within the cell and eliminate waste molecules; and if conditions are right, 
many cells grow and divide. Some can change their shape, and all cells 
respond to changes in the environment and interact in various ways with other 
cells, by processes collectively known as ‘cell communication’, or ‘cell 
signalling’, which you will learn about in Book 2, Chapter 4. All these 
different processes involve constant movement of molecules, multiple 
coordinated biochemical events and in some cases major structural 
rearrangements within the cell. As well as these ‘housekeeping’ processes that 
take place in all cells, some specialised reactions occur only in particular 

cell types. 


In this chapter, some examples of cell diversity are considered. You will see 
that, despite having the same molecular ‘building blocks’, cells can be very 
different indeed in their structure; and you will learn throughout the module 
how, as a result of biochemical and structural specialisations, cells differ in 
their function. In order to fully appreciate the details of cell structure and 
function, it is important to have some basic understanding of how the 
properties of cells are studied. The next section will introduce some of the 
methods that are used to study cells. You will learn about more of the 
techniques that are widely used in cell and molecular biology throughout this 
module. 


Summary of Section 2.1 


¢ Despite their underlying uniformity of molecular and intracellular 
organisation, cells are extremely diverse in structure and in function, live 
in diverse environments and utilise diverse energy sources. 


¢ Cells form communities. Some unicellular organisms form colonies. In 
multicellular organisms, different cells are specialised to perform different 
functions. 


¢ In complex multicellular organisms, cells of a similar type are often 
organised into tissues. In the most complex animals, different cells and 
tissues are often found together in an organ or an organ system which is 
specialised to perform a specific function(s). 

¢ Except when in a dormant state (e.g. spores), all living cells are dynamic. 
They interact with their environment in order to obtain a source of energy 
and molecular building blocks, and respond to environmental changes. 


29 


Generating Diversity 


30 


Individual cells, particularly in multicellular organisms, receive and 
respond to ‘signals’ from other cells. Many cells move within their 
environment. All these processes require a myriad of coordinated 
biochemical events within the cell. 


2.2 How cells are studied: microscopy and cell 
culture 


There is not sufficient space here to describe all the many techniques that are 
used in the study of cells, but some key techniques are introduced as 
appropriate throughout the module. Here, two techniques are outlined: 
microscopy and cell culture. 


In the context of cell morphology, one technique has been of fundamental 
importance, and that is microscopy. Almost all cells are very small, too small 
to be seen with the naked eye, so it was only when lenses and microscopes 
were developed that cells were discovered, and the study of the cellular 
organisation of organisms began. The first simple microscopes were little more 
than individual glass lenses; they were rather like very small, but powerful, 
magnifying glasses. 


The first ‘compound’ microscopes (so called because they contain several 
lenses) were made early in the 17th century. Robert Hooke (1635-1703) has 
been called the greatest scientist of the 17th century. Alongside significant 
contributions to a range of science and technology disciplines, including 
biology, chemistry, physics, geology, architecture and astronomy, he devised 
the compound microscope and illumination system shown in Figure 2.1a, one 
of the best such microscopes of the time. With it he observed a diversity of 
objects, including insects and sponges, and he recorded them with accurate 
drawings and beautifully detailed notes. When Hooke examined thin slices of 
cork under a compound microscope in 1655, he noticed small rectangular- 
shaped structures (Figure 2.1b). He wrote: 


...I could exceedingly plainly perceive it all to be perforated and porous... 
these pores, or cells, ... were indeed the first microscopical pores I ever saw, 
and perhaps, that were ever seen, for I had not met with any Writer or 
Person, that had made any mention of them before. 


Because they reminded him of monks’ cells, he named these structures ‘cells’. 
Although what Hooke was observing were the cell walls in dead cork tissue, 
he had effectively discovered plant cells and gained a first understanding of 
the basic structure of plant tissue. 


2.2.1 Observing small objects 


You are already aware of the wide range of sizes among different organisms, 
from ants to elephants for example, but it can be difficult to appreciate just 
how small individual cells and cell organelles are. A typical prokaryotic cell, 
for example, is about one micrometre in diameter and no more than a few 
micrometres long. If you are not familiar with these very small units of 


Chapter 2 An introduction to cell diversity 


(a) (b) 


Figure 2.1 (a) Robert Hooke’s light microscope. (b) The cell walls of cork drawn 
by Hooke. 


measurement, you should now work carefully through Box 2.1 before 
continuing with the chapter. 


Box 2.1 Units used for measuring the size of cells 


Because cells are so small, they need to be measured in much smaller 
units than those you may be familiar with. In science, the units used for 
measurement are known as SI units, which is an abbreviation for 
‘Systéme Internationale d’Unités’ (International System of Units). You 
will be familiar with the basic SI unit for length, the metre (abbreviated 
to m), and will know that different prefixes are used to denote multiples 
of a metre. For example, a ‘kilometre’ (km) is one thousand metres, 
while a ‘centimetre’ (cm) is one-hundredth of a metre and a ‘millimetre’ 
(mm) is one-thousandth of a metre. 


» How many millimetres are there in one centimetre? 


A metre is made up of 100 cm, or 1000 mm; so there are 10 mm in 
1 cm. 


To get down to the scale of cells, a unit is needed that is one-thousandth 
of a millimetre. This unit is the micrometre; abbreviated to xm (1 is the 
Greek letter mu) and sometimes referred to as a micron, 


= How many micrometres (jm) are there in one metre? 


- If there are 1000 pm in 1 mm, and 1000 mm in 1 m, there will be 
1000 x 1000 = 1 000 000 pm in 1 m. So there are 1 million (or 
10°) pm in 1 m. 


The prefix ‘micro’ strictly speaking means one-millionth, but it is also 
used more generally, as in words like microbe, to mean very small. 


31 


Generating Diversity 


32 


Eukaryotic cells are generally larger than most prokaryotic cells. Animal cells 
typically measure about 10-S0 jm in diameter while the diameter of a mature 
plant cell is typically around 50-100 jm; however, some cells in eukaryotes 
can be very large indeed. For example, the giant nerve cells of squids have 
axons that can be nearly | mm in diameter (i.e. about 1000 times the diameter 
of a typical bacterium) and are also very long, extending up to a metre in 
length. The more typical axons of a large vertebrate are much thinner, at 
around 2 wm in diameter, but they may be several metres in length (for 
example, those in the legs of large vertebrates, that connect the feet with the 
spinal cord). 


= Taking into account the endosymbiotic theory for the evolution of 
organelles in eukaryotic cells (Section 1.2.4), what would you expect the 
size of a mitochondrion to be? 


© You would expect mitochondria to be similar in size to typical bacteria, 
ie. 1 um diameter and a few ym in length. 


In animal cells, mitochondria are indeed usually about | xm in diameter 
(although their length can be much greater than their diameter). You will learn 
more about the shape and size of mitochondria in Chapter 3. 


A schematic illustration comparing the relative sizes of some organisms, cells, 
organelles, molecules and atoms is shown in Figure 2.2. 


Chapter 2 An introduction to cell diversity 


Figure 2.2 The relative sizes of cells, organelles, molecules and atoms, arranged on a logarithmic scale. The ranges 
of structures visible with the light and electron microscopes (Section 2.2.2 and Chapter 3) are also shown. Globular 
proteins are compact proteins with a roughly spherical shape. (Note that the illustrations are not drawn to scale.) 


2.2.2, How a light microscope works and what can be 
seen: resolution 


There are nowadays many different types of microscope, ranging from the A logarithmic scale can be helpful 
basic to the very complex. Which structures can be seen using a microscope belies stewing a wide range of 
depends upon its magnifying power, and this in turn depends upon the lenses Y2/UeS On the same graph. In 

and the type of light used. A simple explanation of how a basic light (or isthe 22 yesc ni uniclo ete scale! 


a ; : is 10 times greater than the 
optical) microscope works can be found in Box 2.2. previous rh 


33 


Generating Diversity 


34 


eye 


objective 


specimen 


Figure 2.3. A conventional compound light microscope with a diagram 
illustrating how the microscope focuses light on a specimen and transmits it 
through the objective and eyepiece lenses to the observer. 


Although high magnification can be achieved using a compound light 
microscope, what can actually be seen also depends on another factor, called 
resolution. The resolution (sometimes known as ‘resolving power’) of a 
microscope is the smallest distance by which two objects are separated and 
can still be seen as being separate (i.e. the two objects can be resolved; they 
do not appear as a single object). Visible light is part of the spectrum of 
electromagnetic radiation, which includes radio, microwave, infrared, visible 
light, ultraviolet, X-rays and gamma rays. Like all electromagnetic radiation, 
light behaves as a series of waves, and the distance between each wave, the 
wavelength, is constant and determines its properties. The wavelength of 
visible light (between about 400 and 750 nm) determines the maximum 
resolution of a light microscope; that is, it is not possible to see detail that is 
much smaller than the wavelength of the light. The best resolution possible 


Chapter 2 An introduction to cell diversity 


for a standard microscope using visible light is about 200 nm (0.2 jm). So, 
even if very high magnification lenses are used, very small structures, 
including many organelles, cannot be distinguished (or resolved) from 
surrounding structures using a light microscope. 


= Could two closely adjacent but not overlapping small spherical structures 
that are (a) 3 xm and (b) 0.05 um apart be distinguished using a light 
microscope? 


4 The structures that are (a) 3 um (i.e. 3000 nm) apart could be 
distinguished (resolved), but structures that are (b) 0.05 jum apart 
(i.e. 50 nm) could not be resolved. 


Details of intracellular structures, including organelles, are studied using 
electron microscopes, which instead of light, use a beam of electrons which 
has a much shorter wavelength than visible light. You will learn about 
electron microscopy, and about organelles, in Chapter 3. 


2.2.3 Light microscopy of cells and tissues 


Small organisms, such as bacteria and many protists, and also individual 
eukaryotic cells such as blood cells or cultured cells, can be viewed under a 
light microscope simply by placing them between two glass surfaces. 
Typically a microscope slide and thin glass ‘coverslip’ are used for this. 


For larger organisms, or for parts of organisms such as the stem of a plant, or 
a sample of a tissue or organ taken for research or diagnostics (known as a 
biopsy), the process is not so straightforward. 


es Can you think of a problem in studying cells within a tissue such as the 
stem of a plant, or the muscle of a vertebrate? 


© The tissue may be very thick, so it may be difficult for light to pass 
through it. 


The study of the organisation of complex plant and animal tissues by 
microscopic analysis of tissue sections is known as histology (from the Greek 
word histos, meaning ‘tissue’ or ‘web’). In order to allow the cells in thick 
tissue samples to be studied, the samples are cut into very thin slices, known 
as sections. For light microscopy, tissue sections are typically between about 
5 and 50 um thick. This tissue sectioning is carried out using special 
equipment, usually after the tissue sample has either been frozen (when it is 
sectioned using a cryostat), or after it has been embedded in a supporting 
material such as wax (when it is sectioned using a microtome). 


A further complication in the study of animal tissue samples is that they are 
very easily damaged, and they cannot be stored for long before they 
decompose. So, to preserve their structural integrity, they are usually 
immediately preserved (or ‘fixed’) in chemical fixatives when they are 
removed from the animal. Alternatively, for some studies, pieces of tissues 
may be preserved by rapid freezing, for example in liquid nitrogen. 


Most animal tissues are translucent when cut into thin sections, so early 
microscopists found it difficult to discern structural detail. During the 19th 


35 


Generating Diversity 


36 


century, the use of chemicals to fix and stain samples was developed and 
stains were identified that bound to particular cellular components or to 
particular types of cell. The use of chemical stains to study tissues is known 
as histochemistry, and is described in Box 2.3. 


Box 2.3 Histochemistry: the use of chemical stains to 
identify cells and some cell components 


Two examples of histochemical stains that are used to identify cells and 
their components are outlined here. The first is the Gram stain, which is 
routinely used as one of the procedures for identifying different types 
of bacteria. 


The Gram stain is named after its inventor, the Danish physician 
Christian Gram. It was devised in 1884 as a method for detecting 
bacteria in animal tissues. The staining procedure starts with a heat-fixed 
smear of bacteria on a glass microscope slide. The smear is first stained 
with a dye called crystal violet and a mordant such as a dilute solution 
of iodine. (The mordant traps the dye inside cells by forming large 
complexes.) This procedure stains all the bacteria a deep purple. The 
smear is then treated with an organic solvent such as acetone or alcohol, 
which dissolves away the purple stain. However, some bacteria, referred 
to as ‘Gram-positive’, resist decolourisation and remain purple. This 
difference in response to the Gram stain arises from the structure of the 
outer layers of the bacteria, which you met in Section 1.2.3 and will 
learn more about in Chapter 3. The smear is then ‘counterstained’ with a 
red dye such as safranin. The bacteria that were purple (the Gram- 
positives) remain purple because the red dye does not show up, while the 
decolourised bacteria, the Gram-negatives, take up the safranin and 
appear red when viewed under a microscope. A typical Gram-stained 
slide of a mixed bacterial population is shown in Figure 2.4. 


4 
10m 


Figure 2.4 The Gram stain, which allows visualisation of bacterial cells, is 
used to distinguish between different groups of bacteria. The image shows 
mixed Gram-positive (purplish) and Gram-negative (pinkish) bacteria. 


Chapter 2 An introduction to cell diversity 


The second example of histochemical staining uses a combination of 
chemicals to visualise subcellular compartments, namely the nuclei and 
cytoplasm of eukaryotic cells. One of the chemicals is haematoxylin, 
which binds to negatively charged molecules such as those with many 
phosphate groups, and so stains nucleic acids and is used to visualise 
nuclei (Figure 2.5); while the chemical eosin binds to positively charged 
molecules, including many cytosolic proteins, and so is used to stain the 
cytosol. 


100 wm 


Figure 2.5 Image showing part of a section of guinea-pig trachea (the 
airway that connects the mouth with the lungs). The section has been stained 
with haematoxylin and eosin. The nuclei of the cells are stained dark purple, 
the cytoplasm of different cells is stained different shades of pink/purple, 
depending on their contents. The epithelial cells that line the trachea are 
clearly visible, as are cilia, which are present on the surface of many of 
these cells and which move, assisting the removal of unwanted material 
from the airways to the mouth. 


These and many other stains have been used with great effect for many 
years to study the organisation of tissues, both in specimens from healthy 
individuals and in samples from diseased tissues. 


Much valuable information about cells and tissues has been obtained by 
histological techniques such as those outlined above, but during the past 40 
years or so, more sophisticated microscopes and more specific labelling 
techniques have been developed and are now widely used to identify 
particular molecules (often but not always proteins) within cells. Particularly 
useful in this respect are the large Y-shaped proteins called antibodies which 
are produced by the immune system of vertebrate animals in response to 
invasion by foreign material, e.g. infection by bacteria or viruses. You will 
learn some more about the immune response in Book 3, Chapter 3. 


37 


Generating Diversity 


38 


An antibody recognises and binds specifically to one particular molecule (or 
part of a molecule). This specificity makes antibodies very useful tools and 
they are used extensively in the techniques of immunohistochemistry and 
immunocytochemistry (‘immuno’ comes from the term immune response). 
Immunohistochemistry is the localisation of specific molecules within tissue 
sections, whereas immunocytochemistry is the labelling of cell preparations, 
such as a cell culture or cell suspension (the cyto denoting ‘cell’). The two 
techniques are also sometimes known as ‘immunolabelling’, summarised in 
Box 2.4. 


Box 2.4 Immunolabelling: using antibodies to identify 
molecules in cells and tissues 


Antibodies have the property of recognising and binding in a highly 
specific manner to a particular target molecule, termed an antigen. So, 
when an antibody is applied to a fixed tissue section or cell sample, it 
only binds to the cells that contain that particular antigen. It is then 
necessary to detect where the antibody has bound, which usually 
involves adding another, ‘secondary’ antibody that recognises the first 
antibody (the ‘primary’ antibody) bound to the antigen (Figure 2.6a). If 
the secondary antibody has been ‘labelled’ with a chemical such as a 
fluorophore, which emits fluorescent light of a particular colour when 
illuminated with light at specific wavelengths, the localisation of the 
bound primary antibodies can be viewed using a specialised fluorescence 
microscope. Alternatively, antibodies can be labelled with enzymes that, 
with the use of an appropriate substrate, produce a coloured reaction 
product at the site of antibody binding, which can therefore be seen 
using conventional light microscopy. 


This ‘indirect’ immunolabelling approach is very convenient because 
researchers can perform double (Figure 2.6b) (and even triple) labelling 
to simultaneously detect multiple molecules in the same sample using 
two (or three) different primary antibodies followed by appropriate 
secondary antibodies labelled with either fluorophores that emit 
fluorescent light of different wavelengths (Figure 2.7a), or different 
enzyme-substrate combinations that give different-coloured reaction 
products (Figure 2.7b). By choosing appropriate antibodies, it is possible 
to, for example, distinguish between different cell types in a tissue on the 
basis of the particular proteins that the cells express. 


Chapter 2 An introduction to cell diversity 


labelled labelled ‘secondary’ 
x my ‘secondary’ antibody molecules 

antibody bind to primary 
molecules antibody molecule 

unlabelled — 

‘primary’ On K 

antibody 

molecule 

bound to antigen 

(a) 


two secondary 


Da * antibody 
*" % molecules labelled 
with different 
fluorophores labelled secondary 
antibody molecules 


bind to specific 
primary antibody 
two different molecules 


unlabelled primary 
antibody 
molecules * 
AA ; 


antigen A antigen B 
(b) 


Figure 2.6 (a) Indirect immunolabelling: unlabelled ‘primary’antibody 
molecules bind to the antigen. Labelled ‘secondary’ antibodies recognise and 
bind to species-specific sequences on the primary antibody. (b) Double 
immunolabelling: two different antigens can be localised in the same 
specimen using two primary antibodies raised in different species. Two 
different species-specific secondary antibodies are then applied, each coupled 
to a different fluorescent or coloured marker. 


In addition to microscopes that are used to view tissue sections and 
cultured cells growing as a single layer, more specialised microscopes 
called confocal microscopes are also widely used in research laboratories. 
Confocal microscopes (Figure 2.7c) allow images of fluorescent labelling 
to be captured at several different levels within a sample, and so allow 
very detailed analysis and even three-dimensional reconstruction by 
computer of labelled cells and thin tissues. With the continued 
development of such technology, microscopy remains one of the most 
versatile and widely used techniques in cell biology. 


39 


Generating Diversity 


40 


(c) 


Figure 2.7 (a) Double immunolabelling of cultured cells a keratinocyte cell 
line), The cells have been exposed to two primary antibodies, followed by 
appropriate secondary antibodies labelled with different fluorophores. One 
primary antibody binds to one type of cytoskeletal protein (keratin, red) and 
the second primary antibody binds to one type of cell junction protein 
(desmoplakin, green). Other cytoskeletal and cell junction proteins are 
present, but not labelled, because the primary antibodies are highly specific. 
(b) Double immunolabelling of hormone-producing cells in the rat pancreas. 
The figure shows immunolabelling using two antibodies, one that recognises 
insulin, the other that recognises another pancreatic hormone, glucagon. The 
binding of the two antibodies is visualised indirectly, by the subsequent 
application of two different secondary antibodies that have been chemically 
coupled to enzymes that produce different-coloured reaction products; cells 
that contain insulin are stained blue, while those that contain glucagon are 
stained brown. (c) An example of a confocal microscope, being used to 
analyse cultures of nervous system cells. Supporting cells (green) and 
neurons (red) can be seen. 


Chapter 2 An introduction to cell diversity 


2.2.4 Cell culture: the study of intact and living cells 


Finally in this section, the advantages of using culture techniques in cell 
biology research are considered, focusing on the analysis of cultured cells by 
microscopy. 


If tissues or cells are removed from an organism and provided with a suitable 
liquid growth ‘medium’ containing all the nutrients that they normally require 
for their metabolism, suitable conditions of temperature and pH, and (in some 
cases) an appropriate surface, or substrate, on which, or in which, to grow, 
then most cells remain alive for some time, and in many cases, grow and 
divide. Cells from many species have been grown successfully in culture. 


Single-celled organisms such as bacteria (and yeasts) can be grown in 
suspension in a liquid medium, or on the surface of nutrient agar plates. Agar, 
a gelatinous substance, is first made up as a liquid to which appropriate 
nutrients are added. The agar is then allowed to solidify in Petri dishes 
(sometimes referred to as ‘plates’), allowing samples of bacterial suspension to 
be spread on the surface. Spreading bacterial samples onto agar containing 
different nutrients and seeing if they grow to form colonies (Figure 1.11a) is a 
simple and convenient way to characterise the type of bacteria in an unknown 
sample. 


In contrast to single-celled organisms and other cell types (e.g. blood cells) 
that typically exist as independent cells in a liquid environment, the cells that 
form part of an animal tissue usually require a solid support, or substrate, to 
adhere to. The first cell culture experiments utilised glass dishes to grow small 
pieces of tissue (hence the term ‘in vitro’, from the Latin, meaning ‘in glass’ — 
in contrast to ‘in vivo’, meaning ‘in the living’). The success of the technique 
was found to depend on the size of the tissue pieces; because, in the case of 
isolated animal tissue samples, the blood supply is necessarily lost, so gases 
can only pass by diffusion from the growth medium into the cells of the 
isolated tissue. Cells in the centre of larger chunks of tissue are thus 
vulnerable to lack of oxygen and accumulation of carbon dioxide. Since these 
early experiments, however, methods have been refined, and enzymes and 
gentle mechanical agitation are frequently used to carefully break down the 
extracellular molecules that hold the cells of a tissue together, so that 
individual cells can be separated and obtained as a suspension. The cells are 
then ‘plated’ onto appropriately treated glass coverslips, or specially made 
plastic dishes. As soon as the cells are plated, they are provided with a 
synthetic liquid culture medium containing all the necessary nutrients, and 
placed in an incubator that has the appropriate temperature and gaseous 
conditions for the cells. 


The animal cells grown in such cultures usually flatten and form a single layer 
and so are clearly visible under the light microscope, allowing living cells to 
be examined. However, since animal cells are translucent, special optics that 
allow some cell components to be visualised are needed in order to see the 
living cells clearly, as shown in Figure 2.8. One such method is phase 
contrast microscopy, which uses the difference in the way light passes through 


4l 


Generating Diversity 


42 


different parts of the specimen to increase the contrast of the image, allowing 
some cell components, such as the cell membrane and nucleus, to be seen. 


(a) (b) 


Lee 
50 um 50 um 
Figure 2.8 Living cells in culture viewed by (a) standard light microscopy, and 
(b) phase contrast microscopy. In (a) almost no detail of cell structure is visible, 
whereas in (b), the cell membranes and nuclei are clearly visible, and some detail 
of the cytoplasm can be seen. The box indicates the same cell viewed by the two 
types of microscopy. 


For the study of complex tissues, three-dimensional culture ‘models’ are 
increasingly used to more closely mimic the tissue of origin. Often such 
cultures are prepared using supportive gels made from proteins such as 
collagen, or synthetic materials. Cell suspensions or mixtures of cells are 
introduced into the gel before it sets, so their growth can be studied in a 
three-dimensional environment. 


The ability to grow certain cell types in cell culture offers many advantages 
for cell biologists. One advantage is the ability to readily assess the direct 
effects of exogenous agents added to the culture. The use of cultured animal 
cells avoids the complexity of whole animal studies, in which it is frequently 
difficult to discriminate the direct effects of an agent on a particular type of 
cell or tissue from secondary effects arising from an action of the agent on a 
different part of the animal. Although the effects seen in culture may be 
different from those seen in vivo (i.e. in the living organism), cell culture 
offers an extremely useful initial procedure with which to screen the possible 
harmful effects of new drugs and other chemicals; so a second advantage is to 
reduce the numbers of animals used for drug testing. A third advantage is that 
the external environment of the cultured cells can be manipulated very 
precisely: for example, by the addition of specific agents, such as signalling 
molecules, to the culture medium. This approach has proved invaluable to 
biologists who are interested in understanding cellular processes such as cell 
movement (Book 2, Chapter 5), and the factors that control cell proliferation 
and longevity (Book 3, Chapter 1), to name but a few. A further advantage is 
that the cultured cells can also be manipulated experimentally, to alter their 


Chapter 2 An introduction to cell diversity 


expression of particular genes (Chapter 6 in this book), allowing the roles 
played by specific gene products to be studied. 


Most primary cells (cells derived from a fresh tissue sample) are only able to 
divide a certain number of times in culture, so it can be difficult to accumulate 
sufficient numbers to work with. For studies where large numbers of cells are 
required, a more convenient source are cell lines, consisting of homogeneous 
populations of cells typically derived from tumours. Such cell lines are 
‘immortal’; unlike primary cells, they have the ability to continue dividing 
indefinitely (Book 3, Chapter 1) and can be propagated as required from 
frozen stocks, without the need to obtain fresh tissue samples. Such cells do 
not behave exactly like ‘normal’ cells, but allow much useful work to be 
done. Some primary cell cultures can be ‘immortalised’ in vitro by various 
means such as the introduction of a particular type of modified virus. This 
procedure allows a theoretically unlimited source of a particular cell type that 
is more like its normal counterpart in an animal than a tumour cell line. 


Finally, living cultured cells can be viewed under the microscope using video 
or time-lapse microscopy, so their movements and interactions can be 
examined. For example, microscopy of intact cells has revealed that 
mitochondria are more complex than previously realised, undergoing both 
fusion and fission and moving around within cells. 


Summary of Section 2.2 


* Much of the current understanding of the chemical nature and organisation 
of cells and how they function has come from microscopy and cell culture 
techniques. 

¢ Light microscopy provides valuable information about the organisation of 
tissues, but is limited by its low resolving power. 

¢ Coloured cell stains are used to better visualise cell structure by light 
microscopy, in a technique known as histochemistry. 

¢ Labelled antibodies are used in immunohistochemistry and 
immunocytochemistry to localise specific molecules within cells and to 
distinguish between cell types. 

¢ Cell culture allows living cells to be studied. Individual cells can be 
observed and the effects of specific molecules on cells and cell processes, 
such as cell division, can be analysed. 


2.3 Prokaryotic cell diversity 


Prokaryotes usually have a relatively simple structure, as outlined in 

Chapter 1. Most Archaea and many Bacteria are round (cocci, Figure 2.9a), 
rod-shaped (Figure 2.9b) or thread-like (filamentous, Figure 2.9c), but some 
Bacteria have more complex specialisations (Figure 2.9d). Several bacterial 
phyla include species that are multicellular. For example, some cyanobacteria 
form chains of cells including specialised cells that ‘fix’ nitrogen, while the 
actinobacteria commonly form branched, multicellular filaments that spread 


43 


Generating Diversity 


44 


over a substrate and then send up aerial branches which produce and release 
single-celled spores. 


(c) 


Figure 2.9 Coloured scanning electron micrographs (SEMs) illustrating common 
shapes of different bacterial species. (a) The coccus Streptococcus pneumoniae 
occurs in the respiratory tract of healthy individuals but can become pathogenic, 
causing conditions including pneumonia, pleurisy, peritonitis and meningitis. (b) 
The rod-shaped E. coli bacterium. Escherichia coli is found in the lumen (central 
space) of the intestines of healthy individuals. There are different strains of E. coli, 
some of which produce toxins, which can cause severe diarrhoea. (c) Spirillum 
volutans, which has a corkscrew or spiral shape. These bacteria typically live in 
water containing organic material, such as stagnant ponds, and require low 
concentrations of oxygen. Most species of Spirillum are not pathogenic, but some 
are, for example Spirillum minus causes the so-called rat-bite fever. (d) The 
bacterium Caulobacter crescentus (so-called because of its crescent shape) is found 
in freshwater, soil and seawater. Caulobacter has two very different forms: a 
mobile (‘swarmer’) cell with a hair-like flagellum for swimming, and an immobile 
reproductive form with an adhesive stalk that enables it to attach to surfaces (seen 
here). 


Chapter 2 An introduction to cell diversity 


Some bacterial species have structures extending from the cell membrane; 
these are of two types, flagella (singular, flagellum, Figure 2.10a) and pili 
(singular, pilus, Figure 2.10b). Bacterial flagella are relatively long, distinctive 
structures that are capable of movement and are the means by which some 
bacteria can ‘swim’ through their aqueous environment. Some types of 
bacteria have a single flagellum, some have several and others none at all. 
You will learn how flagella cause movement in Book 2, Chapter 5. Pili are 
much shorter and thinner structures than flagella and are involved in adhesion 
of bacteria to various substrates, for example to eukaryotic cells during 
infection. Special long ‘sex’ pili are also involved in the transmission of 
genetic material between different bacterial individuals during mating, a 
process known as conjugation, which you will learn more about in Chapter 5 
of this book. 


@ site 


B Tim 


Figure 2.10 (a) The predatory bacterium Bdellovibrio, which lives on other 
Gram-negative bacteria, showing the flagellum. (b) Higher magnification image of 
Escherichia coli, an inhabitant of the human intestine. The short hair-like 
appendages around the bacterium are a type of pili known as fimbriae, structures 
associated with bacterial adhesion to surfaces. This specimen is in the early stages 
of cell division. 


Finally, it is interesting to note that although most single-celled prokaryotes 
are small, there are exceptions. For example, among the spirochaetes, which 
have a characteristic shape rather like a coiled spring (Figure 2.9c), there are 
some that are up to 3 jum wide and over 100 jum long. These giant 
spirochaetes all live symbiotically (symbiosis means ‘living together’) within 
invertebrate animals, notably in the guts of wood-eating termites. Even these 
monster microbes are dwarfed by the Gram-negative coccoid bacterium 
Thiomargarita namibiensis (sulfur pearl of Namibia), the largest bacterium 
known at the time of writing (Figure 2.11). Members of this species were 
found off the coast of Namibia, Africa in sediments on the sea floor. This 
environment is very low in oxygen and these bacteria use nitrate instead of 
oxygen during ATP synthesis (Book 2, Chapter 3). 


45 


Generating Diversity 


46 


0.2 mm 


Figure 2.11 Thiomargarita namibiensis (sulfur pearl of Namibia). These bacteria 
are sometimes known as ‘sulfur pearls’ because they digest sulfur-containing 
compounds, producing sulfur, which is what gives them a ‘pearl-like’ appearance. 
They often occur in chains, as shown here, and although most are between 

0.1 and 0.3 mm across, they can be up to 0.75 mm. 


= How, using a microscope, could you determine that these organisms were 
prokaryotes and not eukaryotes? 


Using a light microscope, the absence of a nucleus, which is 
characteristic of prokaryotes, would be evident. 


You will learn more about the internal or subcellular organisation of 
prokaryotes in Chapter 3 of this book 


Summary of Section 2.3 

¢ Prokaryotes, particularly Archaea, do not exhibit great structural diversity, 
although they do have a range of sizes, shapes and some structural 
specialisations, 


¢ Two structural specialisations associated with the cell membrane of some 
bacteria are flagella, which are involved in movement, and pili, which are 
involved in adhesion and transfer of genetic material. 


2.4 Eukaryotic cell diversity 


The evolution of diverse cell types has enabled eukaryotic organisms to 
survive and reproduce in a range of environments. Selective pressures, 
including predation and competition for nutrients, would have led to the 
evolution of diverse structural specialisations that facilitate the acquisition of 
food and water and the ability to move (e.g. towards nutrient-rich areas and 
away from predators). In large multicellular organisms, the evolution of 


Chapter 2 An introduction to cell diversity 


cellular specialisations has allowed, for example, transport of gases and 
nutrients throughout the organism, or enabled signalling, for example of 
environmental changes, from one part of the organism to another. 


2.4.1 Protists 


The majority of protists are unicellular, although a small number are 
multicellular. There are some 30 phyla that are regarded as protists, including 
some that you may be aware of, such as algae (of which there are a number of 
different phyla), amoebae, slime moulds and diatoms (Figure 1.8). Examples 
that may surprise you are the marine algae, more commonly known as 
seaweeds, including some species that are able to reach very large sizes. For 
example, the giant kelps off northwest America can reach 50 m in length. 
Most protists live in (and during evolution diversified in) aqueous 
environments, and some are parasites, such as Giardia lamblia, which is a 
pathogen of the human gut. Because of their diversity, there is no ‘typical’ 
protist and no characteristic cell features that can be considered to be 
representative of the protists. 


Among the unicellular protists, selective pressures have led to the evolution of 
very diverse cell shapes. As well as very simple single-celled species, such as 
Amoeba proteus (Figure 2.12), there are very complex single-celled protists. 
This diversity is the result of the evolution of different mechanisms for 
movement, feeding, protection and support. In many of these organisms, 
different parts of an individual cell are specialised to perform specific 
functions. 


In aqueous environments, one result of selective pressures is the evolution of 
large size in single-celled organisms, which helps them to resist predation by 
animals that filter feed (strain small food particles from water) or engulf their 
prey (see below). An increase in size, however, poses some problems. 


= Suggest a problem for single-celled organisms that are very large. 


= One problem is the absorption of sufficient nutrients; another is excretion 
of wastes. 


For example, a large spherical cell has a small surface area to volume ratio 
(that is, a small surface area compared with its volume), so absorption of 
nutrients from the surrounding environment is limited and diffusion of the 
absorbed nutrients to the centre of the cell is slow. Similarly, disposal of 
wastes (excretion) from the centre of a large cell poses a problem. These 
selection pressures have resulted in many protists that have evolved flattened, 
lobed shapes, which increase their surface area to volume ratio 

(e.g. Figure 2.12a). 


Many protists also have the ability to engulf food particles by a process 
known as endocytosis, made possible by the possession of a flexible cell 
surface and the ability to extend lobes of cytoplasm known as pseudopodia 
(Figure 2.12a and b). The flow of cytoplasm into the pseudopodia is thought 
to involve particular proteins that are part of the cell ‘skeleton’ or 
cytoskeleton; you will learn more about this important group of proteins in 
Chapter 3. The ability to extend pseudopodia in this way also allows a form 


47 


Generating Diversity 


of movement known as amoeboid movement, often described as ‘crawling’ or 
‘creeping’, by amoebae (and some animal cells) when on a solid surface. 
Pseudopodia are extended and the remainder of the cell follows behind. 


food particle 
(in the process of 
being engulfed) 


(b) 100 pm 


Figure 2.12 (a) An example of a structural specialisation that confers functional 
advantages. Amoeba proteus has amoeboid movement, enabling the organism to 
move and engulf food (see text). (b) Light micrograph of Amoeba proteus. Several 
pseudopodia can be seen. The pseudopodia are seen extending to move. The main 
part of the cell is coloured red and pink in this image. 


Another problem of increased size is transport and communication within the 
cell, from one area to another, particularly from the nucleus to more distant 
parts of the cell. Large unicellular protists have therefore evolved strategies to 
overcome this problem; some species have very large nuclei, others have more 
than one nucleus. These specialisations reduce the distance between the 
nucleus and other parts of the cell. 


Another advantageous strategy for single-celled organisms, which has already 
been mentioned, is the grouping of cells together into colonies (Figure 2,13). 


= Can you think of an advantage of colony formation? 


© Colonies are larger in size than individual organisms, which may reduce 
predation by filter feeders. 


There are examples of colony-forming protists among the Chlorophyta (a 
division of the eukaryotic green algae, not to be confused with the prokaryotic 
blue-green algae, i.e. the cyanobacteria, mentioned in Section 2.3). 
Chlorophytes are photosynthetic eukaryotes, some of which are unicellular, 
some colony-forming and others multicellular (Figure 2.13d). 


Chlamydomonas (Figure 2.13d (i)) is a unicellular genus of the Chlorophyta; 
its members have a flagellum, while members of the closely related Gonium 
genus (Figure 2.13d (ii)) form a small, disc-shaped colony of several cells, 
which is able to move through water because individual cells possess multiple 
flagella that all point in the same direction. All the cells in a Gonium colony 
are the same, and can individually give rise to a new colony. The size of 
Gonium colonies helps them to escape predation by filter-feeding organisms. 


48 


Chapter 2 An introduction to cell diversity 


(d) 


Figure 2.13 Colonial protists. (a) A star-shaped colony of the diatom Asterionella 
formosa, Cells grouped this way sink more slowly than do single cells. (b) A 
green alga with four-celled colonies, Scenedesmus sp. (c) A colony of the green 
alga Pediastrum. The colony has a flat circular shape. (d) Examples of volvocine 
chlorophyte species varying in cell number, colony volume and degree of 
specialisation: (i) Chlamydomonas reinhardtii, a unicellular species; (ii) Gonium 
pectorale, a flat or curved sheet of 8-32 undifferentiated cells; (iii) Eudorina 
elegans, a spherical colony of 16-64 undifferentiated cells; (iv) Pleodorina 
californica, a spherical colony with somatic cells and reproductive cells; (v) Volvox 
carteri; and (vi) Volvex aureus, which both contain a mix of reproductive and 
somatic cells, Where two cell types are present (iv—vi), the smaller cells are 
somatic cells and the larger cells are reproductive cells. 


Members of the chlorophyte genus known as Volvox (Figure 2.13d (v) and 
(vi)) form large groups of several thousand cells. Volvox cells are embedded in 
a gelatinous matrix that forms a hollow sphere in which individual cells are 
connected by their cytoplasm. Beating of individual cell flagella is 
coordinated, allowing the group to move. The cells are dependent on each 
other; if the group is disrupted, individual cells cannot divide independently 


49 


Generating Diversity 


50 


and eventually die. However, such groups of Volvox cells also include several 
specialised reproductive cells which are able to form complete new mini- 
colonies inside the sphere or the ‘parent’ colony. 


= Based on the definitions in Section 2.1, are Volvox colonial or 
multicellular organisms? 


«They are multicellular, because they contain specialised reproductive cells 
which can form new colonies. 


The Chlorophyta are very interesting organisms, not least because they have 
been used to study the genetic changes that have occurred during the 
evolution of multicellularity (Figure 2.13a). A selective advantage for the 
multicellular Volvox is that the ‘daughter’ cells are protected inside the 
‘parent’ group of cells. Another important advantage for larger Volvox is that 
they can store essential nutrients and minerals such as phosphate that they 
have absorbed into the matrix between their cells. 


The problems associated with increasing cell size, described above, probably 
resulted in the evolution of true multicellularity, in which individual organisms 
consist of a number of different cell types, specialised to perform different 
functions, 


Summary of Section 2.4.1 

¢ Many protists are unicellular, and some have evolved large size, diverse 
structures and specialisations that enable them to feed and move 
effectively, and avoid predation. 


* Other protists have evolved a colonial mode of life, which has also 
allowed them to feed effectively and avoid predation. 


2.4.2 Fungi 


Fungi are probably less familiar to you than plants and animals, so first take a 
moment to consider some of the organisms that you would describe as fungi. 


You are likely to have thought of cultivated mushrooms and the ‘wild fungi’, 
including brackets (shelf fungi) on trees and the great variety of ‘toadstools’. 
These structures are all in fact fruiting bodies of fungi, reproductive structures 
that produce and release spores. In addition, you might have thought of the 
microscopic and, often, disease- or decay-causing fungi such as mildews, rusts 
and moulds, and the yeasts that are used in making wine, beer and some kinds 
of bread. 


Like plants, fungi have diversified mainly on land and they occur in soil and 
on the surface of, or inside the tissues of, other organisms (living or dead) 
virtually everywhere. The fact that you rarely see fungi except as the 
occasional toadstool or fuzzy patches of mould reflects the nature of the 
fungal ‘body’ and their mode of life. The majority of fungi consist of 
microscopic filaments or hyphae (singular, hypha) which grow at their tips 
and branch repeatedly (Figure 2.14). Hyphae have rigid cell walls in which 
the main structural component is chitin, a polysaccharide that is also found in 


Chapter 2 An introduction to cell diversity 


the outer skeleton of insects. The mass of hyphae is called a mycelium 
(plural, mycelia). 


Figure 2.14a (ii) and (iii) show that hyphae may have partitions or septa 
(singular, septum) which divide up the hyphae into cell-like compartments. 
However, the ‘cells’ may have one or several nuclei and the septa may be 
perforated, which allows nuclei and cytoplasm to move along the hyphae. In 
some cases (Figure 2.14a (i)) there are no septa at all and this state is 
described as coenocytic (from the Greek words for ‘shared’ and ‘vessel’). 


(a) (b) haa 


Figure 2.14 (a) Diagram illustrating types of fungal hyphae: (i) non-septate (coenocytic); (ii) septate with one 
nucleus per compartment; (iii) septate with many nuclei per compartment. (b) Coloured scanning electron micrograph 
showing hyphae of the fungus Trichophyton interdigitale growing on human skin. 


= From Figure 2.14, should fungi be described as unicellular or 
multicellular? 


© Neither term provides a precise description of the situation in fungi but 
multicellular is how fungi are usually described. 


Some fungi spend part or all of their life cycle as a unicellular or yeast form 
(Figure 2.15a and b). The yeast, Saccharomyces cerevisiae, which is used in 
baking, wine and beer making and as a ‘laboratory’ organism, is generally 
considered to be unicellular, but it has been reported that under certain 
conditions, it can form rudimentary hyphae. This example again highlights 
that much still remains to be learnt about the biology even of well-known 
organisms, and that the classification of cells and organisms is still often 
controversial. 


The hyphal branches of a fungal mycelium have an enormous surface area and 
here lies the clue to the fungal way of life. Fungi are heterotrophic absorbers; 
that is, they utilise pre-existing organic molecules (Chapter | and Book 2, 
Chapter 3) as an energy source, and obtain these molecules by absorption 
from their environment. Because of their rigid walls, they cannot engulf 
particles and instead, their hyphae secrete enzymes, which break down large 
insoluble organic molecules, releasing soluble products that can be absorbed 


SI 


Generating Diversity 


Ly) 


fa) ame) ‘am 


Figure 2.15 (a) Diagram showing the usual unicellular structure of yeast, 
Saccharomyces cerevisiae. (b) Coloured SEM showing Saccharomyces cerevisiae 
(brewer’s, or baker’s, yeast) cells. These cells occur singly. Some cells can be seen 
to have small protuberances; these are ‘daughter’ cells, which are formed by 
budding off from the larger ‘mother’ cells. 


to provide nourishment. The importance of fungal activity in breaking down 
dead organic matter cannot be overstated, because it is central to the process 
of decomposition, whereby mineral nutrients (e.g. nitrate and phosphate) are 
cycled within ecosystems. Many fungi also live in partnership with plants or 
photosynthetic algae, obtaining organic nutrients from these photosynthetic 
organisms and usually supplying inorganic nutrients such as phosphate ions in 
return. This symbiotic mode of life is very ancient: some early fossil plants 
from over 400 Ma ago have been found with fossilised fungal partners and it 
has been suggested that fungal symbionts played a major role in the invasion 
of land by plants; perhaps an example of coevolution. Two species that are in 
a coevolutionary relationship exert a selective pressure on each other; 
therefore each species affects the evolution of the other. 


Whatever their mode of life, however, the basic mycelial structure of fungi 
remains much the same. So although, like bacteria, fungi show great 
metabolic diversity in the substrates they use and diversity of reproductive 
structures, the structural diversity of the feeding stage of the life cycle is 
limited. 


Summary of Section 2.4.2 


¢ Classification of fungi as unicellular or multicellular organisms is difficult 
for many species because of their cellular organisation, but most fungal 
species are generally considered to be multicellular. Some species (the 
yeasts) are considered to be unicellular, but classification of these species 
is also difficult and somewhat controversial, because their growth 
behaviour may change under different conditions. 


Chapter 2 An introduction to cell diversity 


¢ The majority of fungi form filamentous hyphae with rigid cell walls. This 
arrangement is known as the mycelium. Hyphae may be divided up by 
septa and may be multinucleate. 


« Fungi play an essential role in the breakdown of organic material in the 
environment, 


2.4.3 Plants 


Plants are multicellular eukaryotes, adapted primarily to life on land, They are 
photosynthetic organisms (Section 1.2.1) that evolved from green algae. 
Mature plants are non-motile and have cells with rigid walls strengthened by a 
polysaccharide known as cellulose. Most, but not all, have an upright leafy 
shoot (the photosynthetic part), non-green underground parts for anchorage 
and absorption (roots) and a vascular system to conduct water and nutrients 
around the plant. The first plants to evolve were the bryophytes, which are 
small and non-vascular and remain close to the ground; an example is the 
mosses. In contrast, plants with a vascular system are able to grow upwards 
and attain much larger sizes, and so compete effectively for the light needed 
for photosynthesis. The angiosperms, or flowering plants, form by far the 
largest plant group, with the greatest number of species, and this section will 
focus on cells of flowering plants. 


One of the most distinctive features of plant cells is actually an extracellular 
structure, the cell wall, which is rigid and surrounds the cell membrane, 
conferring shape and support. It is composed predominantly of cellulose. 
Small channels in the cell walls allow the passage of water, ions and small 
molecules from cell to cell. The thickness of the cell wall is one feature that 
distinguishes different types of plant cells. Another is the shape of the cell, 
and a third is whether or not the cell is alive; together with living cells, dead 
cells play important roles in providing support and transport vessels in plants. 


There are three main tissue types in flowering plants; these are: 


¢ ground tissue, which provides packing and support, and also energy 
storage, and includes the majority of photosynthetic cells (palisade cells), 
which are located in the interior of leaves 

¢ vascular tissue, which enables transport of water and nutrients within 
the plant 

¢ dermal tissue, which is the outer cell layer, and provides protection, and 
controls uptake of water, nutrients and gases, in different parts of the plant. 


There are several cell types in each of these different plant tissues and some 
ground tissue cells are also found in the vascular tissues. Some of the 
different types of plant cells and tissues are shown in Figure 2.16. 


Ground tissue 

One type of plant cell, usually classified as a type of ground tissue cell, forms 
the basic structural element of all plant tissues. These cells are known as 
parenchyma cells and they form the bulk of stems, roots and leaves. 
Parenchyma cells are quite variable in size and shape (e.g. Figure 2.16b, c, d) 
but a common feature is that they have thin cellulose walls, so are readily 


53 


Generating Diversity 


54 


palisade cells parenchyma cells 


epidermal cells 


Tr eae @ ot! 
== ~ 


(d) 


—— _—— 
xylem — phloem 100 pm starch grains 400 um 
SE 


vascular tissue 


Figure 2.16 Light micrographs illustrating examples of different plant cells. 

(a) The outer layers of a leaf showing epidermal cells (dermal tissue) at the surface 
of the leaf and palisade cells (ground tissue), where most photosynthesis takes 
place. (b) The deeper layers of a leaf. (Palisade cells and parenchyma cells are 
both types of ground tissue.) (c) A section of a root showing the vascular tissues 
(xylem and phloem) surrounded by ground tissues. (d) Parenchyma cells (ground 
tissue) from a potato tuber, specialised for storage. The blue-stained objects are 
starch grains. 


deformed by pressure from adjoining tissue. Adjacent cells are linked by pores 
through their cell walls called plasmodesmata (singular, plasmodesma), 
which are lined by a plasma membrane and have a strand of cytoplasm in the 
middle. Small molecules usually pass freely through these pores, so their 
number and distribution can play a key role in cell-to-cell transport and 
communication. 


Vascular tissue 


There are two types of vascular tissue in plants: xylem, which transports water 
and dissolved ions from the roots around the rest of the plant (Figure 2.16c); 
and phloem, which transports the products of photosynthesis around the plant. 
The arrangement of the two tissue types varies between different plants and in 
different parts of the same plant (e.g. stems and roots), but they are often 
situated close together. The cells of the vascular tissues are arranged end-to- 
end and form tube-like structures, arranged in bundles. 


Chapter 2 An introduction to cell diversity 


Dermal tissue 


Plant dermal tissue, also known as the epidermis, is a layer of cells known as 
epidermal cells (Figure 2.16a) that covers the entire plant. It is usually a 
single cell thick. Epidermal cells have thickened external cell walls and in 
shoots are covered by a layer of waxy material called cutin forming a 
protective layer, the cuticle. 


The epidermis of leaves has pores known as stomata (singular stoma) 
(Figure 2.17), which open and close, thereby facilitating the gas exchange that 
is essential for photosynthesis to occur; carbon dioxide must be able to enter 
leaves and oxygen must be able to leave. Water is also lost from plants via 
open stomata; some water loss is necessary to ensure continuous transport in 
the xylem of water and mineral nutrients from roots to leaves. Stomata occur 
most frequently on the lower sides of leaves and in other protected areas such 
as infoldings of stem surfaces. They are bounded by specialised epidermal 
cells called guard cells, which change shape in response to internal and 
external stimuli (including light, heat and carbon dioxide concentration), so 
opening and closing the pore, and thereby regulating gas exchange and water 
loss. 


epidermal cell pair of guard cells 


guard cell pore cuticle 


epidermal cell 


50 jum 


(a) (b) 


Figure 2.17 (a) Surface view of a leaf epidermis showing a stoma with a pair of guard cells. (b) Coloured scanning 
electron micrograph showing stomata on the lower leaf surface of a garden rose (Rosa sp.). The leaf surface is 
covered by epidermal cells, among which are stomata which are bounded by two guard cells. Guard cells open and 
close the stoma, allowing gas exchange when the stomata are open, and preventing moisture loss when closed. 


Summary of Section 2.4.3 

¢ Plant cells have a rigid cell wall composed predominantly of cellulose. 
This helps to support the plant. 

« The three main types of tissue are ground tissue (packing, support, 
storage), vascular tissue (transport) and dermal tissue (protection and 
uptake of water, nutrients and gases). 


55 


Generating Diversity 


56 


2.4.4. Animal cells 


Animal species exhibit a very wide range of complexity and cellular 
specialisation. Some animals are very simple, containing only a few different 
cell types, or not very highly specialised cells. Others, notably the vertebrates, 
comprise probably the most diverse and also the most highly specialised cells 
of all living organisms. This section will focus on vertabrate, particularly 
mammalian, cells. 


At first, the early histologists studied parts of animals and categorised animal 
tissues according to their function; so the major tissue types were classified as 
nerve (communication), muscle (movement), epithelial (barrier) and 
connective (support and storage) tissues. Additional categories were blood or 
lymphoid cells, germ cells (reproduction), and glandular (endocrine) tissue, 
which is essentially a very complex type of epithelial tissue. In vertebrates, 
however, most tissues are compound in nature; that is, they contain a mixture 
of these six major cell types (for example, muscle contains muscle fibres, 
blood vessels (which themselves comprise several tissues), nerves and 
connective tissue), so there is now a somewhat less rigid classification of 
tissues. Nevertheless, it is a useful skill to be able to recognise some of the 
different cell types typically present in a tissue or organ, 


= Why might such a skill be useful in medicine? 


cA knowledge of the typical or ‘normal’ arrangement and relative 
abundance of cells in a tissue allows detection of ‘abnormal’ cells or 
arrangements of cells, which may occur in certain diseases such as cancer 
or neurodegenerative diseases. 


Identification of cells in a tissue specimen also allows deduction about the 
function of the tissue, which might, for example, be of interest in working out 
the physiology of a newly discovered animal. Cells with similar functions 
often (but not always) have a similar appearance, even in animals that are 
only distantly related. Some examples of animal tissues are shown in 

Figure 2.18, and a summary of different cell types is given in Table 2.1. 


Chapter 2 An introduction to cell diversity 


connective blood vessel with muscle fibre 
tissue red blood cells bundle 


(==) 
nucleus 10m (b) 


[a] 

leukocyte red blood 10 jum 

cells 
Figure 2.18 Transmitted light micrographs of some animal tissues. (a) Skeletal 
muscle, sectioned longitudinally (along the muscle) and stained with haematoxylin 
and eosin. Bundles of muscle fibres can be seen with their nuclei stained dark 
purple. Connective tissue and small blood vessels with red blood cells are visible. 
The individual myofibrils that make up each muscle fibre (Book 2, Chapter 5) 
cannot be seen at this magnification, but their cross-striations can be distinguished, 
because in each myofibril the cross-striations run in register to those of the 
neighbouring myofibrils. (b) Human adrenal gland, sectioned and stained with 
haematoxylin and eosin. This image shows the cells located near the centre of the 
gland (the adrenal medulla), which secrete the hormones adrenalin and 
noradrenalin. The cytoplasm of these cells is stained deep pink, their nuclei are 
purple. Small capillaries, containing red blood cells, are also just visible at this 
magnification. (c) Human adrenal gland, sectioned and stained with haematoxylin 
and eosin. This image shows the cells located near the edge of the gland (the 
adrenal cortex), which secrete steroid hormones, e.g. cortisone. The steroid- 
secreting cells appear pale because the cholesterol that fills their cytoplasm has 
been dissolved away by the chemicals used to fix the tissue. The pale steroid- 
secreting cells are arranged in oval-shaped clumps, enclosed by thin strands of 
connective tissue through which run wide irregularly shaped capillaries, full of 
bright pink red blood cells. (d) Human blood smear, stained with Leishmann’s 
stain. Many red blood cells (salmon pink) and three leukocytes are visible. 
Leukocyte nuclei are stained blue/purple, their cytoplasm is granular and a very 
pale purple colour. Two of the leukocytes have lobed nuclei (these are 
neutrophils), the other is a lymphocyte and has a round nucleus. 


57 


Generating Diversity 


“SPT UOIag UI poutEfdxa ore s]]99 pesiiejod » 


(p sa\deyD ‘sauosowory9 
Jo Jaquinu yewuou ayy Jyey ureyuO0D) projdey are 


uononpoidar 


uuods ‘s8sa s]]e0 wed 


“910 
‘suadoyjed ysa8u1 sadeydoioeu ‘sarpoqnue 
aonpoid sjjao eurseyd pue soyAooydurcy g 


aouayop 


(PR1°Z 2nB14) (sodAy jesdAas) soyAd0yn2] s][99 wayshs ounurut 


ainjeu way 
sngponu oy) aso] SOY ueljeuureu Susshxo 
Spurg YoIYM UIgO[SoWIORY UIe}UOD SOY 


yodsueyn uashxo 


(P8I'z emndt4) (SDMA) S119 Poorg pes $1129 pool, 


pid Jo pasodutos Apsow wisejdoyAo aavy 


uonsa}01d 
‘adesoys ASi9ua 


(s]]J99 anssy 
QANIIUUOD SE PaYyrsse]d 
uayo) sa}Aoodipe 


urys 
Jopun pur surSio ureyioa punose anssy asodipe 


(¢ s9}deyD) [euoyeut 
Je|NY[oow.HXe ay} JO yon aonposd s}sejqosqy 


aunjonnys 


anssy ostuesio 0} djoy 


pue yoddns apraoid 


Burjeusis [eowyoa[o 
Moye yey} Sonsiadosd auesquiow jersads 
dary fsassaooid Suoy yyM s]jao pasuejod ase 


uonesrunurw0s 
oytoads pue pider 


(s]]29 anssn. 
DANIIUUOD Sb PaYISsE]o 
uayo) s][ao yoddns 


s}sejqoiqy 
‘ABEND ‘(s}SEII0}SO PUL S}sE[qOa}SO) S]]29 2u0q 


Apog ut (e1]3ueS) suosnou 
jo sdnosd jyeurs ‘pioo yeuids pue ureig jo suosnou 


(suoimou) s]ja9 aArou 


+ 


sula}oid ayjoe.QUOS UTe}UOD 


uray Jo uoNIeNUOD 


9yosnu (yeay) oeIpses 


S]]99 OyI]-o1Qy aes;INUN[NUI uo] Jo 
wunydouds & Woy ‘sulaj}01d ayHOeUOD UTE}UOD 


squit] Jo yuowaAour 


(egi'z omni) 
(2ouereadde padiys eB sey 11 asnedeq ajosnur 
pareuys se uMOUy UayO) squIT] Jo oJosnUt [R}9;94s 


(¢ saideyD) ,suonounf de3, Aq sayja30) 
poyur] are ‘surajoid ayyoeuos ureyu0D 


sisjeysuiad “Sa 
“WUoUID AOU 


s[98SaA poojqg pur 
SUNSAIUT SB YONs ‘suURTIO [RUIAJUI JO a]DsNW YJoOoOUIS 


s]]99 ajosnut 


uoneynos9 ay) 
OUI sIoSuassoUul [eorUIDYD aJ190s pue sonpoid 


uoneorunuT0s 
peaidsapim 


s]]99 (auL90pua) 
Suronposd-suou0y 


(2 ‘qgI°Z 24n314) pueys jeusrpe ‘seorsued 


#5]]99 pasiejod ‘payury Ajasojo Jo sjaoys wuOy 


uonaisas ‘uondiosqe 
‘Jatueg ‘uonsa}01d 


(spueys Arewurew ‘spueys Areatyes “3’9) 
spue[s Jo sjjoo ‘sduny pue sjassea poojq 
‘auyjsoqut Jo Surury ‘(urys Jo JoAe] J9yn0) stunapida 


sie rerjaynida 


saainjeay peradg 


suoyoung 


sojduexy add} [19D 


7x9} das ‘suoNeuR[dxa payiejap 1OJ “aNssy 2AIIOUUOD sv Paytsse]d UdYO oe s]]99 


anssy asodipy ‘1]29 Jerayuida yo adAq pastjersads e aq 0} pasaprsuos ATUanbay are s[j29 JejnpuR[s Joyjo pur s[jeo suLDOpuy “(WaysAs auNUIU pue poo;q) 
dnosd auo uey) dou! OUI paytsse[d aq ued ‘sa}Ad04NI] se YoNs ‘s]J29 SWOS Jey} OJON *s|[29 uRTeUIUTEUT Jo sadAy jedioutd ayy yo ArewuNg [°7 3mquL 


58 


Chapter 2 An introduction to cell diversity 


2.4.5 Mammalian cell diversity: an example 


As an example of the diversity of animal cells found in tissues and organ 
systems, this final section will consider some of the different types of cell 
found in the mammalian small intestine. Figure 2.19 is a photomicrograph that 
shows a section through the gut wall in which different layers of cells can be 
identified. 


u 
100 tum 


Figure 2.19 Light micrograph of a section of rat small intestine, stained with 
Alcian blue, haematoxylin and eosin. The labelled boxes indicate areas of 
different cell types and these are illustrated in more detail in Figures 2.20-2.23. 


Each of the different cell types within the gut has a role to play in gut 
functions. Smooth muscle cells contract, causing a wave of constriction of the 
gut wall (known as peristalsis), which moves food along the intestine. 
Connective tissue cells provide support. Epithelial cells are a varied group; 
most are involved in the absorption of nutrients, but some produce digestive 
enzymes, some secrete mucus, which aids passage of contents along the gut, 
and yet others are specialised to secrete hormones into the bloodstream. Blood 
vessels, the larger of which are actually composed of several cell types, 
transport absorbed nutrients to the rest of the organism. Cells of the immune 
system (not visible in Figure 2.19) defend against damage by ingested 
pathogens, Nerve cells (neurons) coordinate the activities of the other cell 


types. 


Epithelial cells 


This section continues with a closer look at the structural and functional 
differences between some of these cells, starting with the epithelial cells 
which form a barrier, or interface, across which some substances are secreted 
and nutrients are selectively absorbed. What structural and molecular 
properties of the epithelial cells confer these particular functional 
characteristics? The barrier properties arise because the epithelial cells are 
tightly packed next to each other as a distinct cell layer (Figure 2.20a). This 


59 


Generating Diversity 


packing occurs because of the type and arrangement of structural molecules 
within the cells, which results in the formation of special close contacts 
between them. 


The absorptive properties of epithelial cells arise because of the presence and 
arrangement of specific proteins, called transporters, in their plasma 
membranes, There are many kinds of transporters; those involved in 
absorption of nutrients are located only in the part of the cell membrane that 
comes into contact with food (known as the apical surface; the other 
boundaries, which contact adjacent cells and connective tissue are known as 
the basolateral surfaces). So, the membrane of these epithelial cells is 
polarised; that is, its properties are different on one side of the cell compared 
with the other (Figure 2.20b). Absorption is facilitated by the presence of 
finger-like projections on the apical surface. These projections are known as 
microvilli, and they increase the surface area that is in contact with the 
ingested nutrients. The properties of a tissue or cell are therefore determined 
not only by the particular molecules that they contain, but also by the 
arrangement of these molecules within the cell. You will learn more about the 
properties of intestinal epithelial cells in Chapter 3. 


(a) 


ee 
700 um (b) 


Figure 2.20 (a) Light micrograph showing epithelium from rat small intestine stained with haematoxylin and eosin, 
and also Alcian blue, which primarily stains mucopolysaccharides. Cytoplasm is pale pink; nuclei are dark purple/ 
black. The bright blue areas are mucus present in some of the epithelial cells. EC = epithelial cells; CT = connective 
tissue; BV = blood vessel. (b) Simplified schematic diagram showing some properties of intestinal epithelium. The 
epithelial cells form a barrier because they are closely packed together, and linked by specialised proteins 

(Chapter 3). The cells are polarised; the surface of absorptive cells that is in contact with the nutrients in the gut 
lumen possesses microvilli, which increase the surface area available for absorption, while the other surfaces do not. 
Also shown (not to scale) is the uneven distribution of transporter proteins in the membrane. Different types of 
transporter are present in the apical and basolateral membranes. 


60 


Chapter 2 An introduction to cell diversity 


Smooth muscle cells 


The smooth muscle cells (Figure 2.21) are contractile; that is, their shape can Smooth muscle is under 

change, either shortening (contraction) or lengthening (relaxation). involuntary control, while skeletal 
Coordinated contraction and relaxation of many smooth muscle cells together '™uS¢le is under conscious or 
results in the intestinal movements known as peristalsis. How are the SERNEN/ COTES! 

movements of the separate muscle cells coordinated, and what is the nature of 

the molecules that produce this movement? Again, specific proteins and 

structures are involved in these processes. One type of structure, known as a 

gap junction (Chapter 3), allows electrical and chemical communication 

between the cells that coordinates their contraction; other proteins cause a 

change in shape of the muscle cells during contraction (Book 2, Chapter 5). 


gap junctions 


(b) 


100 um 


Figure 2.21 (a) Light micrograph showing smooth muscle from rat small intestine stained with Alcian blue, 
haematoxylin and eosin. (b) Simplified schematic diagram showing some properties of smooth muscle cells. Smooth 
muscle cells are closely packed, and linked by gap junctions. Muscle cells also contain specialised proteins that 
mediate contraction (not shown). 


Nerve cells 


Next, consider the nerve cell, or neuron (Figure 2.22). Small groups of 
neurons are situated within the gut wall, in small linked clumps known as 
ganglia. The neurons are not identical; they are diverse and have a number of 
different functions in the gut. All, however, are involved in conveying 
information to other cells. Some extend long processes into the surrounding 
smooth muscle, where they activate the smooth muscle cells, stimulating them 
either to contract or relax; others have processes that extend to the epithelium, 
or to other neurons. Neurons, like intestinal epithelial cells, are polarised. The 
functional properties of neurons are reflected in structural specialisations 
which are, again, the result of the presence and arrangement of specific 
proteins in different parts of the cell. 


61 


Generating Diversity 


62 


dendrites, target cell 
axon (e.g. smooth muscle) 


(a) 


(b) (c) 


a | 
700 1m 


a) 
100 um 


Figure 2.22 (a) Simplified diagram showing some of the structural features of 
neurons. Neurons are specialised to transmit electrical signals rapidly, often over 
long distances. Typically they receive information at processes known as dendrites, 
and transmit information to their target cell, which may be a smooth muscle cell 
(as illustrated here), an epithelial cell or another cell type, along a long cellular 
process known as an axon (not to scale). Neurons are polarised cells: different 
membrane proteins are found in different regions of the neuronal membrane. 

(b) Light micrograph showing a small group of neurons in the rat small intestine 
stained with Alcian blue, haematoxylin and eosin. (Note that the processes of the 
neurons are not visible in this preparation.) (c) Fluorescence micrograph of a 
neuron from rat small intestine, labelled with a fluorescent antibody. 


Fibroblasts and leukocytes 


Other cell types present in the gut include connective tissue cells, called 
fibroblasts, and leukocytes (Figure 2.23). These cells also have specific 
functions: fibroblasts provide support and secrete molecules that form the 
extracellular matrix (Chapter 3); leukocytes (sometimes referred to as white 
blood cells) are involved in defence against ingested pathogenic microbes. At 
a first glance, these two cell types perhaps do not have such interesting 
structural specialisations as some of the other cells that have been described 
here, but they each contain and secrete special proteins that determine their 
functions. 


These five cell types (epithelial, smooth muscle, neuron, fibroblast and 
leukocyte) have been used here as examples of cellular diversity and will be 
referred to again in Chapter 3. There are, of course, several other types of cell 
in the gut, and many more in other organs. 


As you have seen, different types of animal cells differ not only in shape, but 
also importantly, in the molecules that they contain. Many structural molecules 
and also enzymes involved in core metabolic reactions are common to all the 
cells of a particular organism, but different types of cells also contain 
additional molecules, usually proteins, that enable them to perform specialised 


Chapter 2 An introduction to cell diversity 


connective 
tissue 


(a) nuclei of blood vessel 100 um 
fibroblasts 


Figure 2.23 (a) Light micrograph showing fibroblasts and connective tissue from rat small intestine stained with 
Alcian blue, haematoxylin and eosin. A blood vessel is also visible in cross-section (lower right). (b) Simplified 
diagram showing some of the properties of fibroblasts. Fibroblasts have an irregular shape, and are often difficult to 
discern by light microscopy. They produce and secrete molecules into the extracellular space, forming the 
extracellular matrix and connective tissue fibres (composed of collagen and elastin). 


functions. You will remember that proteins are coded for by the genetic 
material of the cell, DNA, which is situated in the nucleus of eukaryotic cells. 


= What is the name used for the units of DNA that encode different 
proteins? 


© They are called genes. 


You will recall that although all the cells of an organism contain the same 
genetic information, only some genes are expressed (transcribed and translated 
into proteins) in any particular cell type. Put another way, the different cell 
types of an organism express different genes. It is this differential gene 
expression that gives rise to the structural and functional differences between 
cells. How gene expression is controlled is an important topic, which you will 
return to in Chapter 6. 


You will be beginning by now to appreciate the complexity of cell interactions 
in just one part of an animal. 


= Give two examples of cellular interactions that occur in the gut. 


4 (1) Coordinated smooth muscle contraction involves interactions between 
individual smooth muscle cells. (2) Smooth muscle cells are stimulated to 
contract or relax by the action of neurons. 


All animal cells interact with adjacent and nearby cells, and in addition some 
specialised cells have evolved that provide communication over greater 
distances: for example, neurons, and also hormone-secreting (endocrine) cells. 
You will learn more about cell signalling and communication between cells in 
Book 2. 


63 


Generating Diversity 


64 


Once a cell has received a signal from another cell, it usually responds in 
some way. For example, smooth muscle may contract in response to nerve 
stimulation; other cells, such as some types of epithelial cells may undergo 
cell division, which often occurs in response to factors secreted by other cells. 
It is important to realise that biochemical changes underlie all cellular 
responses. Muscle contraction is brought about by changes in the 
conformation and activities of specific proteins within the smooth muscle cell 
(Book 2, Chapter 5). Cellular responses to signals often involve a chain, or 
cascade, of biochemical events within the cell (Book 2, Chapter 4), often 
culminating in changes in gene expression. Cell division, for example, 
requires coordinated changes in either the activity or synthesis of a large 
number of proteins (Book 3, Chapter 1). Cell interactions thus also play an 
essential role in the regulation of gene expression, and thereby help to 
determine the behaviour, properties and appearance of cells, as you will see 
throughout the module. 


Summary of Sections 2.4.4 and 2.4.5 

« The cells of animal tissues can be classified as epithelial, muscle, nervous, 
connective, blood and immune system, endocrine (hormone-secreting) or 
germ cells. Some cells, such as the hormone-secreting cells of the 
intestinal epithelium, fall into more than one of these categories. 


« Different types of cells are specialised to perform different functions. 
These differences are possible because different types of cells express 
different proteins, and have differing shapes and structural specialisations. 


« Mammalian tissues and organs typically comprise a mixture of different 
cell types; the intestine contains cells of all categories, except germ cells. 


« In animals, interactions between cells play crucial roles in the development 
and functions of both individual cells and organ systems. 


2.5 Final word 


This chapter has introduced the diversity and complexity in cell structure and 
function, and has begun to consider how this diversity can arise from an 
underlying uniformity of organisation. You have also been introduced to some 
of the techniques that are widely used to study cells, and you will encounter 
more examples of data obtained using these techniques during this module, 
and in your future study of biology. 


Diversity in shape, size, function and other cell properties has been illustrated 
here with examples from some better-known, well-studied organisms. It 
should be borne in mind, however, that cell biologists have not made detailed 
studies of all types of cells; and the structure and function of many organisms, 
and how they should be classified taxonomically, remain uncertain. As in 
other areas of biology, there is much that remains to be discovered about 
cellular diversity. In order to understand how cells perform their many 
functions, in the next chapter you will delve more deeply into the internal 
organisation of cells. 


Chapter 2 An introduction to cell diversity 


2.6 Learning outcomes 
2.1 Describe, using examples, the diversity of cells in different organisms. 


2.2 Outline some of the techniques used in the study of whole cells, and 
interpret simple data obtained using these techniques. 


2.3 Describe the different types of cell found in the mammalian intestine, and 
how their structure relates to their function. 


2.4 Interpret and take measurements from annotated images of different cell 
and tissue types. 


65 


a ~ See oo 
ree eee: J 7 _ i = eae 


Zz 
egaremiuo gritisas ¢4—~ 
: mrietige re te Vy pert eth pao pte atti | 


pal i no anata emt. rw * pat el 
: cmt eal freaks ek tyme Aa 


a = 


Chapter 3 A tour of the cell 


Chapter 3 A tour of the cell 


3.1 Introduction 


In Chapter 2, you learnt about the great diversity in the function and 
appearance of cells, which is particularly striking among eukaryotic cells. 
Even very different types of eukaryotic cells, however, exhibit many common 
structures and functions. This mix of uniformity and diversity is also reflected 
in the organisation within cells. This chapter looks in more detail at the 
interior of cells — their subcellular structure, including their ‘ultrastructural’ 
features (sometimes called ‘fine structure’) which are visible using electron 
microscopy. Knowledge of the subcellular components of cells and how these 
components are arranged is fundamental to your understanding of how cells 
perform their functions: that is, how cells ‘work’. 


Schematic diagrams of typical animal and plant cells are shown in Figure 3.la 
and b. You do not need to study this figure in detail now; it will be referred to 
again during the course of this chapter. 


Before beginning your study of cell components, some of the main techniques 
used to study the interior organisation of cells are briefly outlined. 


3.2 How is subcellular organisation studied? 


Subcellular organisation has been studied in two main ways: by electron 
microscopy (EM) and by the separation (fractionation) and subsequent 
biochemical or molecular analysis of the different cell components. 


3.2.1 Electron microscopy 


You will recall from Section 2.2 that light microscopes cannot be used to 
study very small structures (i.e. those less than approximately 0.2 jum across), 
because there is a limit to the resolution, or ‘resolving power’ of such 
microscopes. 


= What is the limiting factor that prevents small structures being 
discriminated using a light microscope? 


© The wavelength of visible light is the limiting factor. 


The problem of the limited resolution of the traditional light microscope was 
overcome in the 1930s, by the development of electron microscopes, which 
use beams of electrons instead of light. Electron beams have a shorter 
wavelength than that of visible light, and the wavelength of a beam of 
electrons decreases as the velocity (speed) at which they travel increases. 
Electron beams accelerated at high velocity through an appropriately prepared 
tissue section and focused on a fluorescent screen can hence be used to 
resolve cellular components that are as small as | nm across. 


There are two different types of electron microscopy. Scanning electron 
microscopy (SEM) (Figure 1.11b) is a technique used to study the surface of 
intact cells and tissues. The sample is coated with a thin metallic layer that 


67 


Generating Diversity 


deflects an electron beam onto a detector. The internal organisation of cells is 
studied using the second technique, transmission electron microscopy 
(TEM), in which a beam of electrons is passed through a very thin tissue 
section, allowing the analysis of cell organelles and other components. 


nucleus centrosome mitochondrion 


cytoskeleton 


microtubule 


microfilaments 
and intermediate 
filaments 


rough 
Golgi endoplasmic 
A Rain reticulum 
PP smooth 
endoplasmic 
reticulum 
ribosomes 
(a) 
smooth 
endoplasmic 
reticulum nucleolus 
chloroplast 
membrane wall 
(b) 


Figure 3.1 (a) Schematic diagram of a ‘typical’ animal cell; (b) schematic 
diagram of a ‘typical’ plant cell. 


68 


Chapter 3 A tour of the cell 


Transmission electron microscopy 


Transmission electron microscopy has been greatly refined since the first 
commercial electron microscope became available in 1939. TEM has allowed 
detailed examination of cell ultrastructure and assisted the identification and 
investigation of cell organelles such as the Golgi apparatus (Section 3.4.7), 
which had previously been seen only as indistinct subcellular structures using 
histochemical techniques and light microscopy. 


As in light microscopy, samples must be fixed and processed before they can 
be viewed using a transmission electron microscope (Figure 3.2a) although the 
reagents used are different from those used for light microscopy. Preservation 
of the tissue is very important in this technique; it is crucial that the 
membranes of the cell organelles are preserved and not obscured by 
precipitates (insoluble deposits) formed during fixation. Glutaraldehyde, which 
fixes proteins, followed by osmium tetroxide, which fixes lipids, are the 
fixatives most often used. The fixed tissue is embedded in a very rigid resin, 
such as epoxy resin, which allows very thin sections to be cut on an 
ultramicrotome. ‘Ultrathin’ EM sections, which are used for study of cell 
organelles, are typically around 70 nm thick (compared with the 5-50 pm 
thick sections typically used for light microscopy). The image is obtained by 
passing electrons through the section, and focusing them on a fluorescent 
screen that emits visible light where electrons strike it. The interior of the 
microscope is under vacuum (to prevent scattering of the electron beam by air 
molecules) and the direction of the electron beam is controlled by magnets. 


Electrons pass readily through unstained tissue sections, because the cell 
components are made up of small atoms such as carbon, oxygen and 
hydrogen. To enable cellular components such as organelles to be viewed, the 
sections must first be ‘stained’ to increase contrast. However, unlike light 
microscopy, which uses coloured stains that absorb light (Box 2.3), in TEM, 
the stains contain heavy metals, such as lead and uranium. These are large 
atoms that prevent electrons passing straight through the section. Uranium 
binds preferentially to nucleic acids and proteins, while lead binds 
preferentially to lipids. So, after staining, cell components that are rich in 
lipids and areas where proteins and DNA are concentrated prevent the passage 
of electrons and so appear relatively dark, or ‘electron-dense’, on the viewing 
screen. Areas in which proteins are less concentrated, such as the cytosol of 
animal cells, appear pale, or ‘electron-lucent’. A typical TEM image, of a 
white blood cell, is shown in Figure 3.2b. 


Immunoelectron microscopy 


Immunolabelling to localise specific molecules to particular cell organelles 
(Box 2.4) can also be applied in electron microscopy. In EM immunolabelling 
(often known as immunoelectron microscopy, or immuno-EM), the antibody 
molecules are tagged with electron-dense substances, the most effective being 
small gold particles, which can be observed as dark dots. Different antibodies 
can be tagged with particles of different sizes, allowing detection of more than 
one type of molecule to be performed simultaneously. 


69 


Generating Diversity 


70 


(b) nucleolus nucleus —_— mitochondria Golgi apparatus 


Figure 3.2 (a) A transmission electron microscope. (b) A transmission electron 
micrograph of a frog leukocyte (white blood cell). The nucleus and nucleolus 
(Section 3.4.3), mitochondria (Section 3.4.10) and Golgi apparatus (Section 3.4.7) 
can be seen. The dark area of the nucleus contains densely packed DNA. 


Figure 3.3 shows an example of how immunocytochemistry at the EM level 
can provide information about the localisation of molecules in an individual 
cell. The figure shows two cells of a bacterium, Neisseria meningitidis, 
labelled with an antibody that recognises a particular oligosaccharide (short 
chain of sugar units). The binding of this primary antibody is visualised using 


a secondary antibody coupled to gold particles, 15 nm in diameter, each of 
which can be seen as a very dark (electron-dense) dot. 


gold particles 


cytoplasm 


nucleoid (DNA) 


300 nm 


Figure 3.3 Electron micrograph showing two cells of the bacterial species 
Neisseria meningitidis immunolabelled with a primary antibody that recognises an 
oligosaccharide, followed by a secondary antibody that has been coupled to 
individual 15 nm gold particles. Note that the cytoplasm has uneven electron 
density. The DNA in the nucleoid is relatively pale (electron-lucent), while the rest 
of the cytoplasm is darker (electron-dense) and granular in appearance because it 
contains many ribosomes. 


= Looking at Figure 3.3, in which part of the bacterium is the 
oligosaccharide located? 


© The gold particles are, with only one exception, found on the outer 
surface of the two bacteria; so it can be deduced that this is where the 
oligosaccharide is predominantly localised in N. meningitidis. 


3.2.2 Cell fractionation 


Although electron microscopy has allowed the appearance of cell organelles to 
be studied, elucidation of their function required a method for separating the 
different organelles from the remaining cell components so that their 
biochemical properties could be studied. This vital link between cell structure 
and biochemistry was made possible by developments in a technique known 
as cell fractionation (Box 3.1). This procedure is carried out in two steps: 
disruption of the cells to release the subcellular components, followed by 
centrifugation (described below) to separate the components on the basis of 
their mass or density. 


Centrifugation 


Large particles suspended in liquid will eventually settle out under the 
influence of the Earth’s gravitational field (the strength of this force is denoted 
g). Small particles, however, will usually separate extremely slowly (or not at 
all) unless subjected to a much greater force than gravity. This can be 


Chapter 3 A tour of the cell 


71 


Generating Diversity 


Density is the mass of a given 
volume of a substance, 
ie, density = mass/volume. 


Physiological solutions are 
artificially prepared solutions that 
have pH and salt concentrations 
that mimic the internal 
environment of the cell. 


72 


achieved by centrifugation, in which the suspension of particles is spun round 
an axis at very high rotational speeds (Figure 3.4), creating what is commonly 
referred to as a centrifugal force, which causes particles to migrate away from 
the axis. The rate at which particles settle out (or sediment) depends on their 
size, shape and density, the density of the liquid, and the rotational speed of 
centrifugation. Centrifugation at different speeds allows the separation of 
particles into ‘fractions’, according to how readily they sediment (Box 3.1). 
For example, at low centrifugation speeds, large cells can be separated from 
small cells. In the 1940s and *50s, advances in the technology of 
centrifugation allowed samples to be spun at very high speeds indeed, a 
refinement known as u/tracentrifugation. For example, a centrifugation speed 
of 30 000 revolutions per minute (rpm) can produce a centrifugal force of 
100 000 times Earth’s gravitational force (100 000 * g). Ultracentrifugation 
using the appropriate types of sample preparation and solutions allows the 
separation of very small particles such as organelles and even 
macromolecules. 


<— centrifugal force —> 


Figure 3.4 Centrifugation of particles in a suspension. High speed rotation of a 
suspension using a centrifuge creates a centrifugal force that forces the particles 
towards the bottom of the tube, where they collect in a pellet. 


ss cell homogenate 
cell suspension _ cells disrupted containing 
or tissue e 
(a) organelles 
supernatant supernatant ‘supernatant 


cell PELLET 


PELLET PELLET PELLET 
homogenate e.g. whole cells, e.g. e e.g. ribosomes, 
nuclei, mitochondria microsomes _ viruses, large 
cytoskeleton and small — macromolecules 
(b) vesicles 
centrifugation fractionation 
—= = 
‘cell slow- 
homogenate sedimenting 
component 
sucrose pierced 
gradient fast- tube 
sedimenting 
component 
fraction 
collection 


(c) 


Figure 3.5 Fractionation of cellular components. (a) Homogenisation. 
(b) Differential sedimentation. (c) Density gradient centrifugation. See text 
for details. 


Chapter 3 A tour of the cell 


745) 


Generating Diversity 


The subcellular components can then be recovered by centrifugation. The 
cell homogenate is dispensed into centrifuge tubes which are placed into 
a rotating holder (known as a rotor) that fits into the centrifuge. As the 
rotor turns, particles suspended in the homogenate migrate towards the 
bottom of the tube. Particles collected at the bottom of the tube form a 
‘pellet’. Since larger components travel fastest, they can be separated 
from smaller particles by differential sedimentation (Figure 3,5b), in 
which centrifugation at progressively higher speeds is used to separate 
particles in the homogenate according to their size and density. After an 
initial centrifugation, the pellet, containing the largest components, is 
separated from the remaining suspension (known as the supernatant) 
which contains the smaller components. The pellet can be re-suspended 
and its components studied, while the supernatant can be transferred to 
another centrifuge tube and spun at a higher speed to recover smaller 
components into another pellet. A series of such steps can be performed 
to separate the different components of a cell homogenate. 


A problem with differential sedimentation is that separation is not 
complete; pellets actually contain something of a mixture of cellular 
components. A more sensitive and widely used method is density 
gradient centrifugation in which the cell lysate is laid on top of a 
density gradient. This is prepared in a centrifuge tube by layering 
sucrose solutions of varying densities, the solutions becoming denser 
towards the bottom of the tube. Movement through the density gradient 
optimises the separation of particles of different density and size into 
different levels or bands in the gradient, so that they can be removed as 
separate ‘fractions’ (Figure 3.5c), either by using a fine pipette or by 
piercing the bottom of the tube and collecting the fractions as the liquid 
drips out. 


Once separation is achieved, the biochemical properties of the different 
fractions can be analysed. Fractionation methods have been of great 
importance in cell biology because they enabled the functions of the 
different cell organelles to be elucidated. Indeed this method provided 
the first evidence for the existence of some organelles, such as lysosomes 
and peroxisomes (small membrane-bound organelles that contain 
different types of degradative enzymes, described later in this chapter). 


Summary of Sections 3.1 and 3.2 
¢ The internal structure of cells is studied by transmission electron 
microscopy, which allows cell organelles to be visualised. 


¢ Immunoelectron microscopy allows the localisation of molecules to 
specific parts of cells. 


¢ Cell fractionation techniques, including ultracentrifugation, allow cell 
organelles to be separated for subsequent biochemical analysis. 


74 


3.3. The organisation of prokaryotic cells 


You have learnt that the general appearance of prokaryotic cells is relatively 
simple; but what about their internal organisation? 


As outlined in Chapter 1, the interior of prokaryotic cells appears rather 
homogeneous when viewed with a light microscope because they do not 
contain a nucleus or membrane-bound organelles. When viewed under the 
electron microscope however, areas of different electron density can be seen. 
The prokaryotic genome typically consists of a circular DNA molecule located 
in the cytoplasm. The area in which the DNA is located (the nucleoid, 
Section 1.2.3) appears relatively pale, whereas the ribosomes, where protein is 
synthesised, are darker, and give the cytoplasm a granular appearance, as can 
be seen in Figure 3.3. You will learn more about ribosomes in Section 3.4.4, 


Some bacteria have visible cytoplasmic ‘inclusions’ that are not membrane- 
bound. Examples of some different inclusions are: gas vesicles (which are 
surrounded by a protein shell and have a role in the buoyancy of some aquatic 
bacteria), endospores (dormant structures formed within some types of 
bacteria when growth conditions are unfavourable, which can later germinate 
and give rise to new individual bacteria) and glycogen granules (glycogen is a 
polymer of glucose, which some bacteria store as an energy reserve). 


A distinctive feature of bacteria, when viewed at either high or low 
magnification, is their surface. At high magnification, it can be seen that there 
are several distinct components to the structure that surrounds the bacterium. 
Like all other cells, bacterial cells are surrounded by a cell membrane, which 
is the innermost of these components (Figure 1.9). Most bacteria also have a 
cell wall that surrounds the cell membrane. This provides support, gives the 
cell its shape and prevents the cell from expanding, or even bursting by taking 
up too much water. The cell walls of Bacteria (but not of Archaea) are 
composed of peptidoglycan, which is a large polymer consisting of a complex 
arrangement of sugars linked by amino acids (including three amino acids that 
are not found naturally in proteins). The cell wall is different in Gram-positive 
and Gram-negative bacteria (Figure 3.6). 


= From Figure 3.6, what is the difference between the cell walls of Gram- 
positive and Gram-negative bacteria? 


© The peptidoglycan layer is much thicker in Gram-positive bacteria. 


The cell wall of Gram-negative bacteria consists of a much thinner layer of 
peptidoglycan forming a loose network within the periplasmic space 
separating the inner and outer membranes. The periplasmic space is filled with 
a gel and contains various enzymes and other proteins. It is the dense 
peptidoglycan layer that traps the purple Gram stain inside Gram-positive 
bacteria (Box 2.3). In contrast, the thin peptidoglycan layer of Gram-negative 
bacteria is not sufficient to retain the stain, and in the final step of the Gram 
staining procedure, treatment with an organic solvent dissolves the cell 
membranes and decolourises the cell. Gram-negative bacteria have a second 
membrane known as the outer membrane (Figure 3.6) which is outside the 
cell wall. 


Chapter 3 A tour of the cell 


75 


Generating Diversity 


76 


cell periplasmic 
membrane : space 


(a) Gram-positive (b) Gram-negative 


Figure 3.6 Diagram illustrating the differences between the cell walls of 
(a) Gram-positive and (b) Gram-negative bacteria. 


Finally, some bacteria secrete polymers, usually polysaccharides and 
sometimes proteins, which form a slimy, outer capsule (Figure 1.9) that 
protects the cell from desiccation and from phagocytosis (by the cells of 
another organism). 


Not all bacteria have such a simple internal structure as the one shown in 
Figure 1.9, Some bacteria have internal specialisations: for example, in the 
photosynthetic cyanobacteria, the cell membrane folds into the cytoplasm to 
form stacks of membranes, which are the site where photosynthesis occurs. In 
other prokaryotes, the cell membrane folds inwards in a more irregular 
arrangement. Some bacterial species also have structures extending outwards 
from the cell membrane. 


= Which two types of structure extend from some bacterial 
cell membranes? 


© Flagella and pili (Section 2.3). 


You will learn more about the organisation of flagella and how they function 
in Book 2. 


Summary of Section 3.3 


« Prokaryotes do not have a nucleus or other internal membrane-bound 
organelles and their cytoplasm appears relatively homogeneous under the 
light microscope; their DNA appears pale, because the rest of the cell is 
densely packed with ribosomes, which are far more electron-dense and so 
appear dark. 

* Most bacteria have a cell wall that lies outside the cell membrane. The cell 
wall of Gram-positive bacteria is composed of a thick layer of 
peptidoglycan, while Gram-negative bacteria have a thinner layer of 


peptidoglycan within the periplasmic space between the inner and outer 
cell membranes. 

« Some bacteria possess membrane specialisations (relatively common 
examples are flagella and pili) and a few exhibit complex folding of the 
cell membrane. 


3.4 The organisation of eukaryotic cells 


In eukaryotic cells many activities are compartmentalised within the 
organelles. The different organelles serve different functions, although in fact 
each type of organelle (e.g. the nucleus, or the endoplasmic reticulum, 
described later in this section) may play a role in several different activities. 
Some activities, for example the production and processing of proteins, 
involve different parts of the cell, or several organelles, as you will see. 


m= Suggest some key cellular functions or processes. 


| You may have thought of the synthesis of different macromolecules 
(DNA, proteins, lipids and carbohydrates), secretion, movement, cell 
division, protection, and there are many more! 


Below, a ‘tour’ though the eukaryotic cell describes the different subcellular 
components and their functions, with an emphasis on a typical higher 
(mammalian) animal cell, but also describing some of the cell components in 
other eukaryotic organisms. It should become clear, as you work through this 
section, that eukaryotic cells and their components are highly dynamic. 


3.4.1 Cell surfaces 


The cell surface is functionally very important, because it is the cell’s 
interface with its environment. 


= Suggest two functions of the cell surface. 


4 Protection, and absorption of nutrients. You may also have thought of: 
secretion of signalling molecules or enzymes, disposal of cellular wastes, 
gas exchange, or cell-cell recognition. 


One major difference between different eukaryotes is the presence or absence 
of a cell wall. 


= In which eukaryotic kingdom do all cells lack a rigid cell wall? 


© The cells of organisms in the animal kingdom lack a rigid cell wall 
(Figure 3.1a). 


Plant cells (Figure 3.1b) and fungal cells have a cell wall, which provides 
support and determines cell shape, while protists (Chapter 2) are diverse in 
this respect; some, such as amoebae, do not have a rigid cell wall, but many 
do. All cells, however, have a cell membrane, which acts as a barrier and an 
interface with the external environment. 


Chapter 3 A tour of the cell 


Activity 3.1 at the end of this 
chapter is an interactive resource 
that provides more detailed 
images of subcellular structure. 


77 


Generating Diversity 


78 


The cell membrane is composed predominantly of phospholipids, arranged in 
a bilayer. A schematic diagram illustrating the main features of a typical 
animal cell membrane is shown in Figure 3.7 (you will learn more about 
lipids and cell membrane organisation in Book 2, Chapter 2). Within the 
bilayer are embedded a variety of proteins and glycoproteins (proteins that 
have sugars attached). These proteins, many of which span the membrane, 
play crucial roles in the interactions of the cell with its environment. Some 
membrane proteins act as transporters or channels that allow selective 
movement of ions, nutrients or other molecules into the cell. Others, often 
glycoproteins, act as receptors, which respond to specific molecular changes in 
the extracellular environment and transduce information about the environment 
to the inside of the cell, thereby allowing appropriate responses to be initiated. 


EXTRACELLULAR FLUID 


‘sugar chain of ; sugar chain of 
glycoprotein glycolipid 


phospholipid 
bilayer 


integral 
protein channel 
(pore) cholesterol 


molecule peripheral 


protein 
CYTOSOL 


Figure 3.7 A schematic diagram of the animal cell membrane. The membrane 
consists of a fluid phospholipid bilayer with proteins embedded in it. Integral 
membrane proteins interact with the hydrophobic ‘tails’ of lipid molecules within 
the lipid bilayer; some of these proteins are transmembrane proteins that act as 
channels (pores) for transport of molecules into and out of the cell. Peripheral 
proteins interact only with the outer hydrophilic ‘heads’ of the lipid molecules. 
Membrane lipids and proteins are often glycosylated, i.e. attached to sugar chains. 


Yet other glycoproteins act as recognition molecules, and can promote 
adhesion between adjacent cells in a tissue. The cell membrane is also linked 
to proteins on its cytoplasmic surface (intracellular proteins). These include 
components of the cytoskeleton, which have a structural role, maintaining the 
shape of the cell. 


3.4.2 The cytoskeleton 


The cytosol of eukaryotic cells contains a system of specialised protein 
assemblies that together form what is known as the cytoskeleton. Another 


term often used is cellular ‘scaffolding’ which is perhaps rather misleading, 
since the protein assemblies that make up the cytoskeleton are not fixed, but 
are highly dynamic and play an essential role in the transport of organelles 
and some molecules within the cell. Cytoskeletal proteins also have other 
roles; their actions produce all kinds of cell movements, including the 
movement of motile cells (Book 2, Chapter 5) and intracellular movements, 
such as those of chromosomes during cell division (Book 3, Chapter 1). In 
animal cells, the cytoskeleton also provides mechanical strength and support, 
and helps to maintain cell shape. In plant cells, the cell wall fulfils this role. 


Cytoskeletal proteins form long, filament-like assemblies, There are three 
types of filaments, formed from different proteins: microfilaments (also 
known as actin filaments), microtubules and intermediate filaments 
(Figure 3.8). 


(a) microfilament (actin) 


25 nm 


(b) microtubule (tubulin) tubulin dimer 


8-10 nm 


protofilament 


(c) intermediate filament 


Figure 3.8 Simplified schematic diagrams showing the structure of (a) a 
microfilament; (b) a microtubule; and (c) an intermediate filament. 


Each type has a distinct distribution in cells, as illustrated in Figure 3.9 for 


intestinal epithelial cells. All three types of filaments are associated with other 


types of protein in the cell. 


Microfilaments 


Microfilaments are present in all eukaryotic cells and are the thinnest of the 
filament types, having a diameter of about 6 nm (Figure 3.8a) and a thread- 
like appearance under EM. They are composed of molecules of the protein 
actin organised into a long helical chain, and so are also sometimes called 
actin filaments. Most microfilaments do not occur singly but are linked 
together in networks or bundles. 


Chapter 3 A tour of the cell 


It used to be thought that 
prokaryotic cells did not have a 
cytoskeleton, but recent research 
has revealed that they have 
filaments assembled from 
proteins similar to those found in 
eukaryotic cells, e.g. actin, The 
prokaryotic cytoskeleton has 
roles in cell division and 
maintaining cell shape and 
polarity. 


79 


Generating Diversity 


Typical plant cells do not have an 
equivalent of the centrosome. 
Instead, less well-defined 
microtubule organising centres 
(MTOCs) appear at different 
times and places to organise 
microtubule networks in different 
types of plant cell. 


80 


Networks of microfilaments are particularly prominent around the edges of 
cells, in a region just below the cell membrane, sometimes called the cell 
cortex (Figures 3.9a and 3.10). Bundles of microfilaments are found in 
microvilli of absorptive epithelial cells (e.g. in the intestine), and in the 
leading edge of moving cells, where the ability of the actin filaments to 
rapidly disassemble and reassemble plays a key role in cell motility (Book 2, 
Chapter 5). 


es | 
25 um 


(a) microfilaments (b) microtubules (c) intermediate 
filaments 


Figure 3.9 The arrangement of cytoskeletal proteins in intestinal epithelial cells: 
(a) microfilaments form a scaffold around the perimeter of the cell; 

(b) microtubules radiate from the centrosome; (c) intermediate filaments link the 
cell to adjacent cells and the extracellular matrix at specific sites at the surface of 
the cell. 


Microtubules 


Microtubules are, as their name implies, tubular structures composed of two 
forms (called a and B) of a protein known as tubulin. They are present in all 
eukaryotic cells and play a crucial role in maintaining cell shape and also in 
intracellular movement: for example, the movement of cell organelles from 
one part of the cell to another, and the reorganisation of chromosomes into the 
daughter cells during cell division. Microtubules are effectively hollow tubes 
typically consisting of 13 parallel filaments of tubulin assemblies, and they 
measure about 25 nm in external diameter (Figure 3.8b). 


In animal cells, microtubules radiate from a microtubule organising centre 
(MTOC) known as the centrosome which is located near the nucleus, 

(Figure 3.1a and Figure 3.9b). Microtubules are very unstable, and are 
constantly being disassembled and reassembled. They begin to assemble at the 
centrosome, and the growing tubules extend out radially, towards the edges of 
the cell (Figure 3.10). Some microtubules disassemble before they reach the 
cell cortex, but they will become stabilised if they attach to an organelle or 
bind to a protein known as a capping protein. 


= What might happen to the shape of a cell if the capping proteins become 
confined to one part of the cell cortex? 


4 Microtubules would only become stabilised at that region of the cell 
periphery, so the cell could become polarised; that is, it could acquire an 
asymmetrical shape. 


= What types of animal cell are inherently asymmetrical in shape? 


| Neurons and some kinds of epithelial cells are examples of asymmetrical 
cells (Chapter 2). 


The selective stabilisation of microtubules, determined by the location of 
capping proteins, is therefore an important factor in determining cell shape. 
The arrangement of microtubules also plays an important role in the 
distribution of organelles within the cell. Microtubules are also components of 
cilia and flagella, which are described in more detail in Book 2, Chapter 5. 


jhe 
10 um 


Figure 3.10 Fluorescent light micrograph of two fibroblast cells. The nuclei are 
labelled pink, microtubules (composed of the protein tubulin) are labelled yellow, 
and microfilaments (composed of the protein actin) are labelled blue. Note that 
the microtubules radiate outwards from the centre of the cell, while actin filaments 
are prominent at the edges of the cell. 


Intermediate filaments 


Intermediate filaments are so called because they are intermediate in diameter 
between microfilaments and microtubules, measuring about 8-10 nm. Whereas 
microfilaments and microtubules are both assembled from a single type of 

protein (actin and tubulin, respectively), there are several types of intermediate 


Chapter 3 A tour of the cell 


Generating Diversity 


heterochromatin 


(a) 


euchromatin outer Meade! 


filament protein. You are probably most familiar with the type called keratins, 
which are abundant in the cells at the surface of mammalian skin. An 
intermediate filament contains about eight ‘protofilaments’ wound around each 
other in a rope-like structure (Figure 3.8c) which confers great strength. 
Unlike actin filaments and microtubules, the intermediate filaments are not 
involved in cell movements; their main role is to provide mechanical strength 
to cells and tissues. 


3.4.3. The nucleus 


The nucleus is the largest, and so usually the most easily identified cell 
organelle. In a typical animal cell, the nucleus is roughly spherical in shape 
(Figure 3.11) and is often, but not always, situated near the centre of the cell. 
The nucleus contains the majority of the genetic information of the cell. Two 
of the major activities that take place in the nucleus are: DNA replication (the 
synthesis of new DNA in preparation for cell division) and transcription (the 
production of RNA copies of parts of the DNA sequence). The production of 
a messenger RNA (mRNA) is the first step in the synthesis of proteins 
(Section 1.1.2). Another important process that takes place in the nucleus is 
the assembly of ribosomes, the complex structures in the cytoplasm where 
mRNA is translated into protein. 


nuclear 


inner membrane ) envelope 


_- pores in nuclear 
=< envelope 


endoplasmic 
reticulum 


(b) 


Figure 3.11 (a) Electron micrograph of the nucleus of a rat liver cell. Heterochromatin and euchromatin, which are 
described in the text, can be clearly differentiated as electron-dense (dark) and electron-lucent (pale) areas, 
respectively. Two electron-dense nucleoli (see below) are also clearly visible. (b) Schematic diagram showing the 
structure of the nucleus, which is linked to the endoplasmic reticulum. 


82 


In order for these three major activities to take place, much coordination and 
many other ‘subsidiary’ activities are necessary. For example, replication and 
transcription do not take place at the same time. The enzymes needed to 

synthesise DNA or transcribe it into RNA must be present at the right places 
and times, and, since protein synthesis occurs in the cytoplasm, there must be 
a means of exporting mRNA (and ribosomes) out of the nucleus, and specific 


Chapter 3 A tour of the cell 


proteins into it. How does the organisation of the nucleus and its component 
molecules allow all of these activities to take place? 


The arrangement of DNA in the nucleus 


You will recall that in eukaryotes, DNA is organised into chromosomes 
(Section 1.2.4), each of which contains a single, very long double-stranded 
DNA molecule. These molecules are much longer than the diameter of the 
nucleus, and are packaged into the nucleus by winding around proteins known 
as histones, which help to coil the DNA, as shown in Figure 3.12. This 
complex of DNA and proteins is known as chromatin. The binding of DNA 
to histones is very ordered and the degree of packing varies. The DNA is 
most tightly condensed in mitotic chromosomes (as shown in Figure 3.12) just 
before cell division (Chapter 4, Section 4.3), when the highly condensed 
chromosomes can be easily seen under the light microscope. The rest of the 
time, individual chromosomes are not visible by light microscopy because 
they are far less condensed, allowing easy access to the enzymes that carry 
out DNA replication and transcription. Constant adjustment of DNA 
packaging and the spacing between nucleosomes is one way in which gene 
expression is controlled (Chapter 6). 


If you look at Figure 3.1la, you see that some areas of chromatin (particularly 
around the edges of the nucleus) have a dark, electron-dense appearance, 
while other areas of the nucleus appear paler. The chromatin in the dark areas 
is known as heterochromatin and is more highly condensed than the DNA in 
the paler areas which is called euchromatin. In general, heterochromatin 
contains far fewer genes and is far less ‘transcriptionally active’ than 
euchromatin. 


The nucleolus 


Another example of the way that the organisation of chromosomes within the 
nucleus relates to function is in the nucleolus (plural, nucleoli), where 
ribosomes are assembled. When viewed by electron microscopy, nucleoli 
appear as large rounded patches of an electron-dense material with a granular 
appearance (Figure 3.11a). Ribosomes are composed of protein and RNA 
molecules (Section 3.4.4) and the genes needed for the production of the 
ribosomal RNA components are located in clusters on several of the 
chromosomes. The relevant sections of these chromosomes loop into the 
nucleolus (Figure 3.13), where the ribosomal RNA molecules transcribed from 
this DNA are packaged together with ribosomal proteins imported from the 
cytosol. The assembled ribosome subunits are responsible for the electron- 
dense appearance of the nucleoli in TEM. 


The size of the nucleolus reflects the activity of the cell. 


= Given the function of the nucleolus, what might its size indicate about 
the activity of a cell? 


a The presence of a large nucleolus (or multiple nucleoli as in 
Figure 3.11a) suggests that the cell is synthesising a large amount of 
protein, as the nucleolus is the site of ribosome assembly, and ribosomes 


83 


Generating Diversity 


DNA double helix 


nucleosome 


histone protein | 
complex 


30 nm! 


aed of 
S00inis chromosome 


| condensed section 700 nm 
of chromosome | 


1400 on 


mitotic 
chromosome 


Figure 3.12 Schematic diagram illustrating the ordered binding of DNA to 
histone proteins. The double helix is wound around a complex of histone proteins 
(shown in purple), forming a series of structures called nucleosomes, which look 
rather like beads on a string. This compacts the chromatin into a strand which has 
a width of 11 nm. The complex winds further and then loops and condenses. The 
greatest degree of chromosome condensation is seen in cells that are about to 
divide. 


84 


are required for protein synthesis in the cytosol. A small nucleolus 
suggests that a cell is not synthesising much protein. 


Nuclear structure and the transport of molecules 


The structure of the nucleus is maintained by a family of intermediate filament 
proteins known as lamins, which form a network of filaments on the inner 
surface of the nuclear membrane. This network is called the nuclear lamina 
and is linked both to the nuclear membrane and to the chromatin; the nuclear 
lamina is thought to help to organise the chromatin. 


When viewed at high magnification, it can be seen that the membrane that 
surrounds the nucleus is actually a double membrane, or nuclear envelope, as 
it is sometimes called. Within the envelope are gaps called nuclear pores 
(Figure 3.11b and Figure 3.14) which create channels of about 9 nm diameter. 
Small water-soluble molecules diffuse freely through the nuclear pores, but the 
movement of proteins into and out of the nuclear pore is regulated by a 
complex arrangement of proteins known as the nuclear pore complex. The 
nuclear pore complex allows RNA molecules and ribosomes to pass out of the 
nucleus, and allows selected proteins to enter the nucleus, but prevents 
passage of most other proteins. 


= What type of proteins would you predict to be transported into the 
nucleus from the cytoplasm? 


«| Histones, which are needed for the packing of DNA; the enzymes and 
other proteins needed for replication and transcription of DNA; the 
proteins needed for ribosome assembly; and the proteins that make up the 
nuclear lamina. 


The nucleus, then, is crucial as the site of DNA replication, transcription of 
DNA into RNA and ribosome assembly. You have seen that the ribosome 
subunits and mRNA molecules leave the nucleus through the nuclear pores. 
The next section describes what happens to them in the cytosol. 


3.4.4 Ribosomes: the sites of protein synthesis 


The mRNA molecules exported from the nucleus are translated into protein in 
the cytoplasm by the ribosomes (which are RNA-protein complexes, not 
organelles). Ribosomal structure is similar in prokaryotic and eukaryotic cells. 
Ribosomes are organised into a large subunit and a small subunit, as shown in 
Figure 3.15 (prokaryotic ribosomes contain over 50 different types of protein 
and those of eukaryotic cells contain more than 80 different types of protein), 
In both prokaryotic and eukaryotic ribosomes, the large subunit contains 

two different ribosomal RNAs in prokaryotes and three in eukaryotes, whilst 
the small subunits contain one ribosomal RNA. 


Ribosomes occur both ‘freely’ in the cytosol, i.e. not attached to any organelle 
(Figure 3.16), or attached to the membrane of the rough endoplasmic 
reticulum in eukaryotic cells (Figure 3.1a and b). Cytosolic proteins and 
proteins that are destined for the interior of the nucleus, mitochondria, 
chloroplasts and peroxisomes (you will learn about these other organelles later 
in this chapter) are synthesised by the free ribosomes in the cytosol. Cytosolic 


Chapter 3 A tour of the cell 


nuclear 
envelope 


Figure 3.13 Schematic 
illustration of the arrangement 


of chromosomes with respect to 


the nucleolus. The regions of 
the chromosomes that contain 
clusters of genes encoding 
ribosomal RNA loop to the 
nucleolus. 


Figure 3.14 TEM micrograph 
showing the surface of the 
nuclear envelope and the 
nuclear pores. 


Evidence suggests that some 
mRNA molecules leaving the 
nucleus are directed to specific 
sites in the cell before they 

are translated, 


85 


Generating Diversity 


poly- 
ribosome: 


single 
ribosome 


Figure 3.16 Electron micrograph 
of ‘free’ ribosomes attached to 
newly transcribed mRNA, Several 
ribosomes joined by an mRNA 
molecule in this way constitute 
what is called a polyribosome, 

or polysome. 


86 


prokaryotes eukaryotes 
% 16S rRNA 18S rRNA 5 
308 subunit -[ eaten E>» a anitedl subunit 
23S rRNA 28S rRNA 
50S subunit] 5S rRNA 5.88 (RNA | 60s subunit 
34 proteins ates 
50 proteins 
70S 80S 


Figure 3.15 The ribosome is organised into a small subunit and a large subunit. It 
is useful to be aware that the sizes of the subunits and their component ribosomal 
RNAs (rRNAs) are described by a so-called sedimentation coefficient ‘S’, which 
depends on their relative molecular mass and shape (S stands for Svedberg unit). 
The small and large subunits are referred to as 30S and 50S, respectively, for 
prokaryotic ribosomes; and 40S and 60S, respectively, for eukaryotic ribosomes. 
The complete ribosome is referred to as 70S for prokaryotic ribosomes and 80S 
for eukaryotic ribosomes. 


proteins, such as cytoskeleton components, are simply released into the 
cytosol once their synthesis is complete, but many other proteins are 
delivered to specific locations by a complex system of protein targeting or 
protein sorting. A signal sequence, a characteristic short sequence of 
particular amino acids in the protein itself, acts rather like a postal 

address that targets the protein for import into the correct organelle. Nuclear 
proteins, for example, contain a nuclear localisation sequence and are 
recognised by proteins that transport them to the nuclear membrane and 
through the nuclear pore. Proteins destined for mitochondria, chloroplasts or 
peroxisomes also have a characteristic signal sequence directing them to the 
appropriate organelle. This ‘tagging’ of polypeptides by signal sequences 
comprising specific amino acids is an important general mechanism used by 
all cells to recognise and deliver proteins to specific sites in the cell. 
Prokaryotic cells lack organelles, but they use a similar mechanism to target 
proteins to the cell membranes, cell wall, or to various types of inclusions in 
the cytosol. 


As you will learn in the next section, lysosomal proteins (Section 3.4.9), 
proteins that are destined for export from the cell, cell membrane proteins, 
and many of the organelle membrane proteins, are translated by ribosomes 
that are attached to the endoplasmic reticulum, 


Chapter 3 A tour of the cell 


3.4.5 The endoplasmic reticulum 


The nuclear membrane is continuous with an extensive membrane system 
known as the endoplasmic reticulum (ER, see Figure 3.1 and Figure 3.11), 
which extends through much of the cytoplasm and is the site of many 
activities, including the synthesis of lipids in the smooth endoplasmic 
reticulum and the synthesis of lysosomal proteins, cell membrane proteins and 
secreted proteins in the rough endoplasmic reticulum. Importantly, the interior 
of the ER is separate from the cytosol. The membranes of the ER form tubes 
and sacs, and depending on the plane in which a section cuts through the cell, 
the ER may therefore have the appearance of circular or elongated tubes, or 
parallel systems of membrane, or any shape in between, as illustrated in 
Figure 3.17. 


Some parts of the endoplasmic reticulum appear smooth when viewed by EM 
and are hence named smooth endoplasmic reticulum (SER). Smooth 
endoplasmic reticulum is involved in the production of phospholipids and 
steroids (Book 2, Chapter 2) and also in the detoxification of substances such 
as drugs or ingested chemicals. Ingestion of large amounts of a toxic 
substance, such as a pesticide, results in an increase in the amount of SER in 
cells. Normally, however, most of the endoplasmic reticulum appears granular, 
being dotted with many ribosomes (Figure 3.17). This part of the endoplasmic 
reticulum is given the name rough endoplasmic reticulum (RER), and is the 
site where lysosomal proteins (mainly digestive enzymes), membrane proteins, 
and proteins that are destined for export from the cell are synthesised and 
processed. 


ribosomes 


rough endoplasmic 
reticulum 


smooth endoplasmic 
reticulum 


(b) 


Figure 3.17 (a) Electron micrograph showing rough endoplasmic reticulum (RER) in a rat liver cell. The granular 
appearance is due to ribosomes that are attached to the endoplasmic reticulum membrane. (b) Schematic diagram of 
rough endoplasmic reticulum and smooth endoplasmic reticulum; note that the appearance can be as elongated or 
spherical tubes, or something in between. 


87 


Generating Diversity 


88 


What determines whether a particular mRNA molecule is translated by 
ribosomes that are free in the cytosol or by ribosomes that are attached to the 
RER? The answer lies again in the amino acid sequence of the protein being 
synthesised. In fact, translation of all eukaryotic mRNAs starts on ‘free’ 
ribosomes in the cytosol (Figure 3.18a), but if an mRNA encodes a protein 
that is destined for lysosomes, or to be secreted from the cell, or to be 
embedded in a cell membrane, the ribosome is rapidly redirected to the 
endoplasmic reticulum by a process known as cotranslational localisation. 
As soon as the beginning of the translated polypeptide (which is known as the 
N-terminus, or amino (NH3) terminus) starts to emerge from the free 
ribosome, a signal sequence for localisation to the RER, close to the N- 
terminus, is recognised by a protein complex called the signal recognition 
particle (SRP). Translation pauses, and the whole complex, including the 
ribosome, mRNA and partially translated polypeptide, is transported to the 
membrane of the RER. There, translation recommences and the growing 
polypeptide chain is simultaneously translocated across the membrane of the 
RER and starts to enter the RER lumen, the space inside the RER 

(Figure 3.18c). 


One of two things may then happen: for some proteins, the translated 
polypeptide continues to enter the RER lumen, until finally the completed 
polypeptide is released inside the RER. These proteins usually pass on to the 
Golgi apparatus (see Section 3.4.7) and are eventually secreted from the cells 
or packaged into lysosomes. Alternatively, some proteins remain partly 
embedded within the RER membrane. 


= Thinking of how polypeptides are ‘directed’ to the RER, suggest a 
molecular mechanism that would determine whether a particular protein 
remains in the RER membrane, or enters the RER lumen. 


© A protein that is destined to remain in the RER membrane could have a 
special ‘stop’ sequence of amino acids, which remains in the membrane 
and thus prevents the protein from completely entering the lumen. 


That is indeed what happens; a stop-transfer signal halts translocation and the 
protein remains embedded in the ER membrane. These proteins are either 
destined to be delivered to other membranes, usually the cell membrane, or 
remain in the ER, where they play a role in ER function. Some proteins, many 
of them receptor proteins, are ‘multipass’ transmembrane proteins. One 
example, rhodopsin, is a light-sensitive receptor protein involved in light 
perception in the retina of the mammalian eye. It has seven membrane- 
spanning sections or domains. You will learn more about the role of similar 
transmembrane proteins in cell signalling in Chapter 4 of Book 2. A signal 
sequence at the N-terminus of the rhodopsin protein acts as a start-transfer 
signal which directs translocation of the growing polypeptide chain through 
the ER membrane until a stop-transfer signal is reached (Figure 3.18d). A 
second start-transfer signal sequence further along the growing polypeptide 
chain threads another section of the protein across the membrane until it 
reaches a second stop-transfer signal. Alternating start- and stop-transfer 
signals allow the rhodopsin protein to thread back and forth through the 
membrane seven times. 


Chapter 3 A tour of the cell 


(a) initiation of protein synthesis 
on free ribosomes 


CYTOSOL 
mRNA 
ak localisation 
Nie sequence 
J . | RER localisation sequence 
no RER localisation sequence RER localisation sequence and alternating membrane 
J I translocation sequences 
synthesis on binding to and synthesis binding to and insertion 
free ribosomes across RER membrane into RER membrane 
CYTOSOL CYTOSOL CYTOSOL 
mRNA c's 
NH 
RER MEMBRANE 
RER LUMEN 
(b) post-translational localisation (c) cotranslational localisation (d) cotranslational localisation 


into RER lumen into RER membrane 


Figure 3.18 An overview of protein localisation in eukaryotes. (a) During translation, all protein synthesis is 
initiated on a free ribosome in the cytosol. The N-terminus of the protein (denoted NH here) emerges from the 
ribosome. (b) In the absence of an RER localisation signal sequence, translation continues on the free ribosome and 
the polypeptide is released into the cytosol once translation is complete. It may then be post-translationally relocated 
to somewhere else in the cell. (c) The presence of a specific RER signal sequence at the beginning of the polypeptide 
directs binding of the whole ribosome-mRNA-polypeptide complex to the RER membrane. The polypeptide chain is 
synthesised across the RER membrane and into the RER lumen (cotranslational localisation). (d) Transmembrane 
proteins are retained in the RER membrane by alternating translocation start-transfer and stop-transfer sequences that 
thread the polypeptide through the membrane in sections. 


The proteins translated in the RER are then modified in different ways. One 
important modification that begins in the RER is protein glycosylation; that is, 
the addition of sugar residues; glycosylated proteins are known as 
glycoproteins (Figure 3.7). Some sugar residues can also serve as part of the 
‘address label’, and are involved in protein targeting later on; others have 


89 


Generating Diversity 


90 


different roles. The initial glycosylation of proteins occurs in the RER, 
through the action of enzymes that are embedded within the RER membrane. 
The modified proteins that are destined for export or to be delivered to 
lysosomes or the cell membrane, then pass from the RER to the Golgi 
apparatus (see Section 3.4.7), for further processing. 


3.4.6 Vesicles 


The proteins and glycoproteins leaving the RER are transported to the Golgi 
apparatus in small spherical membrane-bounded compartments, known as 
transport vesicles, which pinch off from the RER. The membrane of the 
transport vesicle fuses with the Golgi membrane, so proteins inside the vesicle 
are released into the Golgi lumen while proteins embedded in the vesicle 
membranes are incorporated into the Golgi membrane. After further 
processing, sorting and packaging in the Golgi apparatus, the modified 
proteins are again packaged into vesicles (Figure 3.19b) and delivered to their 
final destination in a similar manner. Secretory cells usually have large 
numbers of secretory vesicles which fuse with the cell membrane to release 
their contents outside the cell. There are many different types of vesicle, even 
in cells that are not specialised for secretion. Vesicles are also used to excrete 
waste materials and to transport many different molecules within cells, not just 
proteins and glycoproteins. Newly synthesised lipids from the SER are 
incorporated in vesicle membranes and are therefore also delivered to the cell 
membrane in these small structures. 


How do vesicles move within cells, and what mechanism ensures their correct 
delivery? The surfaces of vesicles have different coatings and surface signals 
that identify their contents and ensure that they fuse with the correct target 
membrane. Vesicles can move short distances within the cytosol by diffusion, 
but movement to more distant sites in the cell is mediated by the action of 
special ‘motor’ proteins that are able to ‘walk’ vesicles along the cytoskeleton 
network (you will learn more about this type of movement in Book 2, 
Chapter 5). 


3.4.7 The Golgi apparatus 


The main functions of the Golgi apparatus are: to sort the proteins and lipids 
arriving from the ER according to any ‘labels’ they carry; to complete the 
glycosylation (and some other modifications) of proteins; and finally, to 
package the proteins and lipids into vesicles according to one of three main 
destinations: (1) the lysosomes, (2) the cell (plasma) membrane, or (3) 
secretion from the cell. 


When viewed in the electron microscope, the Golgi apparatus appears as 
stacks of smooth, flat membranous sacs, known as cisternae, which are 
slightly curved and surrounded by vesicles of various sizes (Figure 3,19). The 
Golgi often has a relatively inconspicuous appearance compared to the RER 
and its size varies in different cell types. 


= Why might the size of the Golgi be different in different types of cells? 


Chapter 3 A tour of the cell 


Figure 3.19 (a) Electron micrograph of the Golgi apparatus from a rat liver cell. The flattened cisternae can be seen. 


(b) Schematic drawing of the Golgi apparatus, showing the cisternae and associated transport vesicles. 


© Different cells secrete different amounts of proteins and lipids. Some 
cells, such as hormone- or enzyme-secreting cells, for example, secrete 
large quantities of these proteins and so are specialised for their 
production and packaging for export. Other types, such as smooth muscle 
cells, export very little protein. 


= Would you expect a cell that does not secrete any lipids or proteins to 
have a Golgi apparatus? 


© Yes, all cells need at least small quantities of lipid and protein to be 
delivered to their cell membrane. So, all eukaryotic cells have a Golgi 
apparatus, 


3.4.8 Protein secretion from the cell 


Intestinal epithelial cells, which you learnt about in Chapter 2, are a good 
example of the diversity of secretory function. The intestinal epithelium 
contains many cells that are specialised for absorption (and have a small Golgi 
apparatus and only a few secretory vesicles), but it also includes several 
different types of cell that are specialised for secretion. Some of these cells 
secrete digestive enzymes from their apical surface which is exposed to the 
intestinal lumen (where ingested food is located); different enzymes are 
secreted in different parts of the gastrointestinal tract. Cells of another type 
secrete heavily glycosylated proteins known as proteoglycans into the lumen 
of the intestine. These mix with water to form mucus which acts as a 
lubricant, aiding the passage of material along the gut. Yet other epithelial 
cells secrete local hormones, from the other surfaces of the cell (the 
basolateral sides. i.e. base (basal) and side (lateral)) that are in contact with 
other cells. These hormones are secreted either into blood vessels or close to 
the surface of nearby cells. 


91 


Generating Diversity 


absorptive local hormone- GUT 
cell producing cell LUMEN 


Basal 
surface. 


‘goblet’ 


(mucus-producing) cell 


Figure 3.20 Schematic 
illustration showing cell polarity 
in intestinal epithelial cells. 
Absorptive cells have a large 
area of many microvilli on the 
apical surface facing the gut 
lumen; these increase the area 
available for absorption, The 
mucus-producing ‘goblet’ cells 
have many secretory vesicles 
which convey proteoglycans to 
the large invagination, or 
‘goblet’, which opens into the 
gut lumen, Hormone-producing 
cells have only a small part of 
their surface exposed to the gut 
lumen; their primary role is the 
detection of nutrients in the gut 
lumen, and the appropriate 
secretion of hormones into 
blood vessels underlying the 
opposite (basal) side of the cell. 


92 


Each type of intestinal epithelial cell is therefore polarised; it is not uniform, 
either in shape or function, as shown schematically in Figure 3.20. 


Another factor that is important to think about when considering the secretion 
of molecules from cells is the difference in demand for different types of 
exported molecules. For example, some substances are constantly delivered to 
the cell membrane, and are continuously released in small quantities all the 
time. This sort of release is known as constitutive secretion. An example is the 
secretion of mucus from intestinal epithelial cells. Other molecules, however, 
are only released at certain times, in response to some kind of signal. This 
type of release is known as regulated secretion. An example of regulated 
secretion is that of gastrointestinal hormones and digestive enzymes, which 
are released in response to ingestion of food. 


Within each of these three types of intestinal epithelial cell, a complex sorting 
and delivery system exists. All the membrane-associated and secretory 
proteins are synthesised in the RER, processed and packaged in the Golgi, and 
then they continue by transport in vesicles along the cytoskeleton to the cell 
membrane. What about the mechanism by which molecules are actually 
released from cells? This involves a process known as exocytosis. During 
exocytosis the vesicle membrane fuses with the cell membrane, releasing the 
vesicle contents outside the cell. 


What is the general term for the converse process, in which extracellular 
materials are ingested by engulfment by extensions of the cell membrane 
which form into vesicular structures within the cytosol? 


a The term is endocytosis (Section 2.4.1). 


You have already learnt that organisms such as amoebae feed by endocytosis 
(sometimes referred to as phagocytosis when it involves larger particles such 
as bacteria). Cells of the immune system, known as phagocytes, ingest 
bacteria in this way. You will learn more about exocytosis and endocytosis, 
and other ways in which substances cross cell membranes later in Book 2, 
Chapter 2). What is the fate of the materials that the cell ingests? They are 
broken down by yet another type of organelle, the lysosome. 


3.4.9 Lysosomes and peroxisomes 


Lysosomes are small spherical organelles, enclosed by a single membrane, 
which are common in animal cells but rare in plant cells. They measure about 
0.5-1.0 ym across, and they contain digestive enzymes. Lysosomes fuse with 
membrane-bound endosomes (containing nutrients ingested by endocytosis), 
and the lysosomal enzymes digest large nutrient molecules. Cells can also 
degrade and recycle the components of their own organelles and structures 
when they are old or damaged, or if the cell is ‘starving’ in the absence of 
nutrients. This process, known as autophagy, usually involves formation of a 
membrane around the cell component and fusion of the resulting vesicle with 
lysosomes. Autophagy is thought to have an important role in many processes 
including cell growth, cell death and infection. Lysosomes contain many 
different kinds of digestive enzymes, and the inside of the lysosome is acidic, 
with a pH of around 5. The enzymes of the lysosome are specialised to 


Chapter 3 A tour of the cell 


perform their function only at this low pH; so, should they leak out into the 
cytosol, which has a pH of around 7.2, they do not do a great deal of damage. 
The inside of the lysosome is made acidic by the action of specialised 
transport proteins that lie in the lysosomal membrane and ‘pump’ hydrogen 
ions into its lumen. Other lysosomal membrane proteins transport the useful 
products of digestion out of the lysosome into the cytosol. 


= What useful materials would be produced by digestion in the lysosome? 


© The products would depend on the starting material, but could be amino 
acids, sugars and nucleotides. 


Peroxisomes are also small enzyme-containing organelles bound by a single 
membrane, which are very similar in size to lysosomes, measuring between 
0.2 and 1.0 um in diameter. They are thought to be present in all eukaryotic 
cells. In mammals, they are particularly plentiful in liver cells and adipocytes 
(Table 2.1) but are much less abundant in other cells. Like lysosomes, 
peroxisomes also have a role in metabolism; they contain enzymes that break 
down fatty acids and amino acids, resulting in, among other things, the 
production of the toxic substance, hydrogen peroxide. Peroxisomes therefore 
also contain high levels of an enzyme known as catalase which breaks down 
the hydrogen peroxide into harmless products (water and oxygen). Lysosomes 
and peroxisomes are shown in Figure 3,21. 


lysosomes peroxisomes 


(a) 


Figure 3.21 Electron micrographs showing (a) lysosomes and (b) peroxisomes 
(dark-staining structures) seen in liver cells. 


For many years, peroxisomes were thought to be identical to lysosomes in 
their properties. The difference between these organelles was discovered as the 
result of cell fractionation experiments (Box 3.1). Although very similar in 
size, their contents and therefore their densities are different, so under 
particular centrifugation conditions the two organelles sediment in different 
fractions. Different enzymes were found to be associated with the two 
fractions and it is now known that the two organelles are very distinct. 


= What is the difference between the site of synthesis of lysosomal and 
peroxisomal proteins? 


93 


Generating Diversity 


94 


© Lysosomal proteins are synthesised at the RER (Section 3.4.5); 
peroxisomal proteins are synthesised by free ribosomes in the cytosol 
(Section 3.4.4). 


= How are the correct proteins delivered from the cytosol to the 
peroxisomes? 


© Peroxisomal proteins have a signal sequence to ensure their correct 
targeting. 


Correct targeting of peroxisomal proteins to the peroxisomal membrane or 
interior occurs because of specific sequences in the proteins, as described 
previously. 


Having studied the organelles and other cellular components involved in the 
synthesis and delivery of proteins in the cell, in the next section you will 
consider an organelle that you already know something about, the 
mitochondrion, which plays a fundamental role in the generation of ATP in 
eukaryotic cells. 


3.4.10 Mitochondria 


Mitochondria have a distinctive appearance when viewed by electron 
microscopy. They often appear as rounded or sausage-shaped structures 
(Figure 3.1a, b and Figure 3.22a, b), measuring about 0.5—1.0 jm in diameter 
and 2-8 um in length; although their size and shape vary, and they are often 
much bigger in plants. For many years all mitochondria were assumed to have 
this fairly regular shape, but recent research, in which mitochondria have been 
observed in living cells, has shown that mitochondria are dynamic; they divide 
and fuse, and the shapes of individual mitochondria change. Mitochondria also 
move; their positions in the cell are not fixed. In many cells, mitochondria are 
long and extended and have a network-like structure (Figure 3.22c). So, rather 
like the ER, the appearance of mitochondria as round or elongated and 
flattened spheres in micrographs is often simply due to the plane in which 
they are sectioned. 


Mitochondria have a double membrane; the inner membrane is thrown into 
many folds known as cristae, which are what gives the organelle its 
distinctive appearance. The two membranes effectively create two 
compartments in the mitochondrion. The intermembrane space lies between 
the two membranes; the much larger mitochondrial matrix lies within the 
inner membrane, as shown in Figure 3.22b. 


The mitochondrial matrix contains many enzymes, including some of those 
involved in the breakdown of glucose (Book 2, Chapter 3). It also contains 
mitochondrial DNA, and the molecules and cellular components needed to 
transcribe and translate mitochondrial genes including ribosomes. 
Mitochondrial genes encode most of the proteins of the inner mitochondrial 
membrane, which plays a fundamental role in the generation of ATP. The 
majority of mitochondrial proteins, however, are encoded by the cell’s nuclear 
DNA and are synthesised in the cytosol or RER and imported into the 


Chapter 3 A tour of the cell 


inner 
membrane 


5 (@ | _ intermembrane 
mitochondrial S | 
matrix eae 


outer 
membrane 


— 


(c) 10 um 


Figure 3.22 (a) Electron micrograph of mitochondria from a rat liver cell. The double membrane and cristae are 
clearly visible. (b) Schematic diagram of a mitochondrion. (c) Fluorescent light micrograph of cultured human cells. 
The nuclei are labelled blue and the mitochondria red. The image shows the irregular shapes and network-like 
arrangement of the mitochondria in these cells. 


mitochondrion. Again, the correct delivery of mitochondrial proteins from the 
cytosol depends on specific signal or targeting sequences. 


Mitochondria produce most of the ATP in eukaryotic cells, and are hence 
more abundant in cells that have high requirements for energy, such as muscle 
cells. 


3.4.11 Chloroplasts 


Chloroplasts, the sites of photosynthesis, are found in plant cells and 
photosynthetic algae. Like mitochondria, chloroplasts have their own 
ribosomes and their own DNA, which encodes the proteins of the internal 
membrane that use light energy to generate ATP in photosynthesis (Book 2, 
Chapter 3). Most other chloroplast proteins are, however, encoded by the 
nuclear DNA and must be transported into the chloroplasts. Chloroplasts are 
typically about 2-4 um wide, and 5-10 wm in length and have three 
membrane systems. 


In addition to an outer and inner membrane (which encloses a space filled by 
a fluid substance known as the stroma, described below), chloroplasts have a 
complex structure of internal membranes (here referred to as lamellae; 
singular, lamella). The lamellar structure consists of flattened sacs called 
thylakoids, which consist of a thylakoid membrane surrounding a thylakoid 
lumen (Figure 3,23a and b). The thylakoids are often stacked into grana 
(singular granum), which look rather like stacks of pancakes. The grana are 
linked by stroma lamellae, which join the stacks of grana together to form a 
single compartment. The thylakoid membranes and stroma lamellae have 
embedded within them the chlorophylls and other light-absorbing pigments 
required for photosynthesis (Book 2, Chapter 3). 


The chloroplast stroma is the site of many important reactions, including the 
conversion of carbon dioxide into sugars, the production of starch from sugars 


95 


Generating Diversity 


96 


thylakoid thylakoid 
membrane lumen 


(b) membrane 
Figure 3.23 (a) Electron micrograph of a chloroplast. A: double outer membrane 
or envelope (the outer and inner membranes are labelled separately in (b)); B, 
stroma; C, stroma lamellae; D, granum, composed of a stack of thylakoids. 

(b) Schematic diagram of chloroplast structure, 


(starch granules are often seen in chloroplasts) and the synthesis of some 
chloroplast proteins (there are ribosomes, DNA and RNA within the stroma). 


3.4.12 Vacuoles 


An organelle that is very prominent in all plant and fungal cells (but usually 
not in animal cells) is the vacuole (Figure 3.1b). Vacuoles can be very large 
and they have a number of roles. In mature plant cells, the vacuole is typically 
a membrane-bound space that fills most of the centre of the cell and is 
basically a storage compartment for water, ions and small organic molecules. 
In combination with the cell wall, vacuoles also have a role in determining the 
shape of the plant cell. A cell in which the vacuole is completely full of water 
is said to be turgid, and this cell turgidity ensures that the plant remains rigid 
and upright. In dry conditions, loss of cell turgidity results in plant wilting. 
Water molecules pass freely in and out of the vacuole, but the vacuolar 
membrane (the tonoplast) controls the passage of other molecules with 
specialised transporters that are different to those of the cell membrane. 


Chapter 3 A tour of the cell 


Vacuolar pH is lower than that of the cytosol, and vacuoles can contain 
enzymes that digest large organic molecules in a similar way to lysosomes in 
animal cells (lysosomes are rare in plant cells). 


3.4.13 Organelle biogenesis and additional roles 


There is not sufficient space in this chapter to describe all the properties of 
organelles in detail, so before leaving the topic, it is important to make two 
general points: firstly, organelles are dynamic structures; and secondly, they 
have additional roles to those described above. 


New organelles form by the growth and division of existing organelles, 
processes that are independent of nuclear division (Chapter 4 of this book). 
The formation of new organelles requires the production of new 
phospholipids, most of which are assembled in the SER. The new 
phospholipids are then incorporated into the different organelles in two ways. 
The first is by fusion of vesicles derived from the ER; the Golgi and 
lysosomes form in this way. The second is by direct incorporation of 
phospholipids into the organelle membrane via special membrane proteins; the 
mitochondria, chloroplasts and peroxisomes incorporate new phospholipids in 
this way, as they do not fuse with vesicles. 


In addition to their main roles described in this chapter, many organelles are 
also involved in other functions. One such function is regulated cell death, 
often referred to as programmed cell death, one form of which is known as 
apoptosis, which you will learn about in Book 3, Chapter 1. Programmed cell 
death is of importance in ensuring appropriate cell numbers, particularly 
during the development of animals and in disease. The cell membrane, 
mitochondria and the ER are all involved in programmed cell death. 


Another function of the ER and the mitochondria is in the regulation of the 
levels of calcium in the cell. Calcium plays an important role in cell 
signalling, as you will find out in Book 2, Chapter 4. 


Summary of Section 3.4 


¢ Eukaryotic cells are surrounded by a cell membrane, which is composed of 
a phospholipid bilayer in which proteins and glycoproteins are embedded. 
The membrane forms a barrier between the cell and its external 
environment; membrane proteins and glycoproteins mediate selective 
transport of substances into and out of the cell, and interactions with the 
extracellular environment, including neighbouring cells. 

¢ The eukaryotic cytoskeleton is composed of three types of filament-like 
structures assembled from cytoskeletal proteins: microfilaments (also 
known as actin filaments), microtubules and intermediate filaments. The 
cytoskeleton collectively provides shape, support and an internal transport 
system, and is responsible for cell movements. 

« The nucleus, which has a double membrane (or nuclear envelope), contains 
most, but not all, of a cell’s DNA, which is packaged by binding to 
histone proteins to form chromatin. DNA replication and also transcription 
to synthesise RNA take place in the nucleus, as does the assembly of 


97 


Generating Diversity 


98 


ribosomes. RNA and assembled ribosome components leave the nucleus, 
and structural and ribosomal proteins and the enzymes needed for DNA 
replication and transcription enter it, through pores in the nuclear envelope. 


Messenger RNA is translated by ribosomes to synthesise proteins. 
Cytosolic proteins and proteins destined for the nucleus, mitochondria, 
chloroplasts and peroxisomes are synthesised by ribosomes that are free in 
the cytosol. Proteins destined for these organelles carry amino acid signal 
sequences that target them for recognition by special proteins that transport 
them to the appropriate cell compartments. 


Proteins destined for lysosomes, for secretion from the cell, or for 
embedding in the cell membrane have signal sequences that relocate the 
ribosome on which they are being translated to the rough ER, where their 
translation is completed. Glycosylation of proteins also starts in the RER. 


The smooth ER is the site of phospholipid assembly and also 
detoxification (for example of drugs or toxins). 


Proteins and lipids pass from the RER to the Golgi apparatus for further 
processing (e.g. glycosylation), and sorting and packaging into vesicles, 
which deliver proteins and lipids to lysosomes or to the cell membrane 
(either for embedding in the cell membrane, or secretion from the cell). 


Vesicles move short distances within the cell by diffusion, but are 
transported longer distances by movement along microtubules. Movement 
of vesicles to specific sites in the cell is achieved because the vesicles 
have special coats and surface signals that identify their contents and 
ensure they fuse with the appropriate membrane. 


Substances are secreted from cells by exocytosis in which vesicles fuse 
with the cell membrane. One way in which substances are imported into 
cells is by engulfment into a cell membrane-bound vesicle. This process is 
known as endocytosis. 


Ingested materials and ‘old’ organelles are digested within lysosomes. 
Some fatty acids and amino acids are broken down in peroxisomes. 


Mitochondria are the site of the majority of ATP production. They have a 
double membrane, the inner membrane of which is folded; and they also 
have their own DNA and ribosomes which synthesise some of the proteins 
required for ATP synthesis. Most mitochondrial proteins are, however, 
encoded by the nuclear DNA and are imported into the mitochondria, 


Plant cells have a cell wall and two organelles not found in animal cells: 
chloroplasts, which are the site of photosynthesis, and the vacuole. 
Chloroplasts have three membrane systems: outer, inner and internal 
thylakoid membranes, The complex internal membrane system is where 
photosynthesis takes place. Many other activities including synthetic 
reactions take place in the chloroplast stroma. The vacuole is the site 
where water, ions and small organic molecules are stored. It provides 
turgidity to the cell and may contain enzymes that digest large organic 
molecules. 


New organelles form by growth of pre-existing organelles, followed by 
division. These processes are independent of nuclear division. 


Chapter 3 A tour of the cell 


e Organelles have a number of other functions, including roles in 
programmed cell death and calcium signalling. 


3.5 Cell interactions in tissues 


In Chapter 2 you learnt that in multicellular organisms, cells tend to be 
organised into groups, or tissues. This section considers the structural 
components that hold the constituent cells of tissues together. These are cell 
junctions, and an important component of all tissues, the extracellular matrix 
(which so far have only been mentioned in passing). The focus here is on 
animal tissues. 


3.5.1 Extracellular matrix 


The extracellular matrix is composed of proteins and polysaccharides that are 
secreted by cells and assembled into an organised meshwork external to the 
cells. The amount and type of extracellular matrix varies from tissue to tissue, 
and different types of extracellular matrix perform different functions. In many 
tissues, it acts rather like cement between cells; in others, it forms a support 
or scaffold. The extracellular matrix is not merely a scaffold, however, but can 
have an active role in regulating cell behaviour, and the communication 
between cells. Connective tissues (Section 2.4) have a relatively large 
proportion of extracellular material which determines their properties: for 
example, the rigidity of bone is due to an extracellular matrix composed of 
tough collagen fibres that is hardened by mineral salts (see below). 


In most animal tissues, the extracellular matrix is secreted mainly by cells 
called fibroblasts, and the major component is a gel-like substance in which 
proteins are embedded. The gel is composed of polysaccharides called 
glycosaminoglycans (GAGs) which are often attached to proteins to form 
heavily glycosylated proteins called proteoglycans. The GAGs are negatively 
charged, and attract positively charged ions, particularly sodium ions. The 
high concentration of positive ions leads to the movement of water by 
diffusion into the matrix, giving it its gel-like properties, including the ability 
to resist compression. 


Fibrous proteins and other substances within the extracellular matrix provide 
additional strength or elasticity. The most abundant extracellular protein in 
animal tissues is collagen, of which there are several types. Collagen is a very 
tough fibrous protein that can, in its different forms, provide great strength 
and/or flexibility (Figure 3.24). Elastin, as its name suggests, is a flexible 
protein that is found in the extracellular matrix of tissues such as blood 
vessels. In bone, mineral salts are laid down in connective tissue to 

provide support. 


Some cells are linked to particular proteins embedded in the extracellular 
matrix. For example, an extracellular matrix protein known as fibronectin 
binds to cell membrane glycoproteins known as integrins. On the inside of 
the cell, the integrins are linked to actin filaments of the cytoskeleton, as 
shown in Figure 3.25. Integrins can transduce information from the 
extracellular matrix to the cytoskeleton, which may then stimulate the cell to 


99 


Generating Diversity 


EXTRACELLULAR 
MATRIX 


protein link 
(fibronectin) 


Figure 3.25 Schematic diagram 
showing how the extracellular 
matrix is linked to some cells, 
via integrin molecules, which 
span the cell membrane. Outside 
the cell, integrin links via 
fibronectin to collagen. Inside 
the cell, another type of protein 
links integrin to actin filaments. 


fibrils 
(long-axis 
cut) 


fibrils : 
(transvel 
cut) 


500 nm 


Figure 3.24 Collagen fibrils (fine fibres) in the mouse intestine. During 
sectioning, some fibrils have been cut transversely, others across their long axis. 

A type of connective tissue cell known as a fibroblast, which produces collagen, is 
also visible. 


change shape, move or respond in other ways, to certain changes in its 
external environment. ' 


Interactions between cells and the extracellular matrix, then, play important 
roles in maintaining tissue structure, and also in cell properties such as 
adhesion and migration. 


3.5.2 Cell junctions 


Cell-cell contacts also play extremely important roles in tissue function. For 
example in some tissues it is essential that there is very close contact between 
cells so that an effective barrier is set up and pathogenic microbes, and even 
harmful or unwanted molecules, cannot readily pass between the cells. 


= Give an example of circumstances where close contact between adjacent 
cells is important. 


a One example is in the epithelial tissue that lines the gut wall, where it is 
important that there are no large spaces between the cells, where harmful 
microbes in ingested food could enter the body. Another example is the 
skin, which also has a protective role. 


Cells may have a number of different types of specialised cell-cell contacts 
known as cell junctions. Epithelial tissue, which is composed of closely 
packed sheets of cells has several types of cell junction each with specific 
functions and formed by different specialised proteins. Examples of the 
arrangement of the main types of junction between epithelial cells are shown 
in Figure 3.26. 


Figure 3.26 Schematic diagram showing cell junctions and their functions in 
epithelial tissue. Epithelial cells are characterised, in part, by their close packing. 
They are held together by specialised membrane proteins that form specialised 
junctions between adjacent cells. 


Gap junctions are a type of specialised cell junction that allows intercellular 
communication. In the smooth muscle of the intestine, which must contract in 
a coordinated manner for peristalsis to occur, the smooth muscle cells have 
gap junctions which allow transmission of molecules and electrical signals 
between the adjacent smooth muscle cells. 


Tight junctions, as their name suggests, link cells very closely forming a 
barrier that prevents the passage of molecules and ions between cells. They 
also prevent the movement of integral membrane proteins within the cell 
membrane. Hence they are important in maintaining, for example, the polarity 
of the epithelial cells of the intestine (Figure 3.20). 


Adherens junctions link the actin cytoskeleton of adjacent cells together, 
often forming a belt-like arrangement around each of the cells in an epithelial 
sheet. The link occurs via transmembrane proteins known as cadherins, and 
intracellular proteins that link cadherins to the cytoskeleton. 


Anchoring junctions (called desmosomes) link adjacent cells together, while 
hemidesmosomes link cells to the extracellular matrix. Again, these links 
occur via cadherins. In this case, however, the link is to intermediate filaments 
of the cytoskeleton rather than actin filaments. 


There are other specialised types of connections between animal cells, such as 
the contacts between neurons and their targets, known as synapses. In plants, 
there are the plasmodesmata, which you encountered in Section 2.4.3. These 


Chapter 3 A tour of the cell 


101 


Generating Diversity 


102 


channels, lined by a cell membrane, make a connection between the cytoplasm 
of plant cells (through a gap in the cell walls), enabling transport and 
communication between the cells. 


(LOs 3.1 to 3.4) Allow 3 hours for this activity 
In this activity you will view images of different cell types to enhance your 
appreciation of cell diversity and your understanding of how the structure of 
cells relates to their functions. 


Summary of Section 3.5 


+ The cells of animal tissues are held together by the extracellular matrix 
and by cell junctions. 


© The extracellular matrix in animals is composed of a hydrated mixture of 
glycosaminoglycans (GAGs) and proteoglycans, in which various proteins, 
especially fibrous proteins like collagen, are embedded. The nature of the 
proteins determines the properties of the matrix, which vary from tissue 
to tissue. 


¢ Cells can be linked by several types of cell junctions, formed by 
specialised membrane proteins. 


¢ Epithelial cells are characterised, in part, by their very close packing. They 
are held together by specialised junctions between adjacent cells. 


3.6 Final word 


In this chapter you have learnt about the subcellular components of cells and 
how they are studied. Cells in multicellular organisms, particularly in animals, 
are specialised to perform different functions; the properties, abundance and 
distribution of the cytoskeleton and organelles allow, and indeed underpin, this 
functional and structural diversity. Your study of this chapter has provided you 
with a general understanding of cellular architecture and its relationship to 
different processes in the cell. This sets the scene for later chapters which 
describe how some of these processes occur at a molecular and biochemical 
level. 


The next chapter explores inheritance — how the heritable characteristics of 
cells and organisms are passed from one generation to the next; while 
Chapter 5 describes the role of the genetic material, DNA, and the molecular 
basis of inheritance. 


3.7 Learning outcomes 
3.1 List the main components of cells. 


3.2 Summarise the structure and function of the different components 
of cells. 


3.3 Outline how cell ultrastructure is related to cell function. 


3.4 Identify cell organelles and the main cytoskeletal components in diagrams 
and EM micrographs and interpret simple EM images. 


Chapter 3 A tour of the cell 


103 


— 
“rive apie) As* 
ee 
as 


Owe =i, So, a = 


Chapter 4 Inheritance 


4.1 Introduction 


The previous chapters in this book have highlighted how, despite using similar 
biomolecular components, life on Earth has evolved into an immensely 
diverse range of organisms. The key to evolution is the action of natural 
selection on variation among individuals. Natural selection favours 
characteristics, or traits, that maximise the production of large numbers of 
surviving offspring, thereby increasing the frequency of those favourable 
heritable traits in the next generation, at the expense of less favourable traits. 


If you look at populations of animals and plants around you, you will notice 
physical variation between individuals of the same species, be they humans or 
oak trees. While some of this variation can be attributed to environmental 
factors — for example, poorly fed animals are generally smaller than well-fed 
animals and plants growing in shaded conditions may be smaller than those in 
brighter conditions — many physical characteristics of individuals are largely 
heritable. Clearly, breeds of animals maintain their appearance and children 
resemble parents, but the mechanisms by which physical characteristics of an 
organism are passed on through the generations remained something of a 
‘black box’ until the late 19th century. Despite lacking an understanding of 
the mechanisms of heredity, human populations have always used breeding 
and selection to mould the animals and plants with which they coexist, and on 
which they depend. 


The mysteries behind mechanisms of inheritance began to be unravelled in the 
19th century by the work of Gregor Mendel, an otherwise obscure Austrian 
monk, who observed that certain heritable traits in pea plants followed a set 
pattern. His experiments demonstrated that the inherited traits resulted from 
the segregation of discrete units that we now know of as genes. 


Genetics is the branch of biology that deals with the mechanisms of heredity, 
and the variation of inherited characteristics among related organisms. In the 
present age, the language of genetics has entered the general vocabulary; there 
cannot be many who do not recognise words such as gene, DNA or genome, 
and it is important to understand the principles of genetics behind those 
concepts. This chapter examines the patterns of inheritance identified by 
Mendel and others and will largely concentrate on inheritance in eukaryotic 
organisms, whilst the chapter that follows deals in more detail with the 
molecular basis for inheritance in both prokaryotes and eukaryotes. 


4.2 Genes and genomes 


Mendel conducted his research in complete ignorance of the physical basis of 
heredity; chromosomes were not identified as the carriers of heredity until the 
early 20th century, and DNA was finally pinpointed as the genetic material by 
the 1940s. DNA is a large polymer containing thousands of nucleotides 
(Section 1.1.2) and has a number of characteristics that allow it to hold 
genetic information. The double-stranded helical structure of DNA results 


Chapter 4 Inheritance 


105 


Generating Diversity 


106 


from pairing between nucleotide bases on the two strands. There are four 
different bases, abbreviated as A, G, C and T, and it is the order, or sequence, 
in which the bases occur that provides what is popularly referred to as the 
DNA ‘code’ (Chapter 5), 


The fundamental unit of heredity is the gene. A more descriptive definition 
might be that a gene is a unit of inheritance corresponding to a defined section 
of DNA sequence that encodes a product (often a protein, but sometimes a 
functional RNA, such as a ribosomal RNA) but which also includes more 
extensive surrounding DNA sequences that play a role in controlling how the 
encoded information is used, or expressed. You will learn more about the 
structure and expression of genes in Chapters 5 and 6 in this book. The full 
complement of an organism’s genetic information is known as its genome. 
This consists of the DNA corresponding to genes (as defined above) but often 
also a great deal of other DNA sequence, which was for many years regarded 
as ‘junk DNA’, lacking in function. More recent research is beginning to 
reveal that some of this DNA has important structural and control functions, 
as you will learn in Chapter 5. 


The genetic make-up of an individual, with respect to the specific form of a 
gene or genes it carries, is referred to as its genotype. A particular genotype is 
expressed as the phenotype; the observable characteristics or traits of a cell or 
organism that depend on the genotype (Box 1.2). The relationships between 
genotype and phenotype have been revealed through decades of intense 
genetic research on a number of taxonomically diverse laboratory organisms 
(Box 4.1), which has established some basic hereditary principles that can be 
applied to many different organisms. 


Box 4.1 Model organisms in genetic research 


Many discoveries in genetics have come about using so-called model 
organisms, species that, for a number of reasons, have found favour in 
research laboratories. Classic genetic experiments are carried out by 
‘crossing’ (mating) individuals of known genotype in order to study the 
characteristics of their offspring (or progeny). The duration of the 
reproductive cycle of a model organism is therefore clearly important: 
genetics can test the patience of the experimenter if obtaining the 
progeny of a cross takes too long! Chance (or convenience) has, 
however, often played a part in the choice of model organism. 


Popular model organisms include those listed in Table 4.1 and shown in 
Figure 4.1. Generally these were used in genetics research largely 
because they were easy and cheap to breed and maintain in the 
laboratory. Few were specifically adopted for the purpose of genetic 
research — the nematode worm Caenorhabditis elegans is a notable 
exception. Proposed in the 1960s as a model for the investigation of 
animal development, C. elegans has a relatively small genome and a 
conveniently short life cycle. This tiny multicellular organism, about 

1 mm long, is transparent and has only 959 body cells (compared with 
many trillions of cells in a human body). A technique called RNA 
interference (Chapter 6), which can inactivate the expression of 


Chapter 4 Inheritance 


individual C. elegans genes, has enhanced the usefulness of this 
organism for genetic analysis. The zebrafish Danio rerio has similarly 
become a useful model for studying developmental biology because the 
embryo and young fish are transparent and techniques for gene 
inactivation are available. The tiger pufferfish (Zakifugu rubripes) has 
been adopted as a model by genome researchers because it has the 
smallest known vertebrate genome, at about 400 x 10° base pairs (pairs 
of nucleotide bases linking the two DNA strands, Section 1.1.2). 


(e) (f) 


Figure 4.1 Some of the species used as model organisms in genetics 
research. (a) E. coli; (b) S. cerevisiae; (c) C. elegans; (d) A. thaliana; 
(e) D. melanogaster; (f) M. musculus. 


A key consideration is the extent to which experimental findings made in 
one species may be extrapolated to other species, and in particular to 
human biology and physiology. Fortunately, the evolutionary 
relationships between organisms mean that many fundamental biological 
processes are highly conserved, so while the biology of C. elegans 
differs greatly from that of mammals, discoveries made in C. elegans 


107 


Generating Diversity 


For the purposes of this chapter 


the eukaryotic genome is taken 
to mean the nuclear DNA, but 
the term can be applied more 

widely to include the DNA 

of organelles. 


108 


(for example, on the genetics of ageing) have parallels in flies, mice and 
humans. Similarly, advances in the genetics of developmental biology 
made using Drosophila fruit flies have made a significant impact beyond 
Drosophila biology. Several Nobel Prizes for Medicine have been 
awarded for genetics-related work on C. elegans and Drosophila. 


Table 4.1 Characteristics of some model organisms used in genetics research. 
In the case of the unicellular organisms E. coli and S. cerevisiae, life cycle 
refers to the duration of one cell division cycle. In multicellular organisms, life 
cycle is the time taken to develop from an egg to a reproductively mature 
adult in animals, or from germination to mature seeds in plants. Estimates of 
eukaryotic genome size and gene number vary because of the repetitive nature 
of the DNA sequences (Chapter 5), 


Species Life cycle Approximate genome Estimated 
duration _ size in base pairs gene number 

Escherichia coli ~20 min 4.6 « 10° 4000 

(bacterium) per division 

Saccharomyces ~2hper 12x 10° 6000 

cerevisiae (yeast) division 

Caenorhabditis elegans ~3 days 100 * 10° 19 000 

(nematode worm) 

Drosophila 10 days 170 « 10° 14 000 

melanogaster (fruit fly) 

Arabidopsis thaliana 6 weeks 100 = 10° 25 000 

(thale cress) 

Mus musculus (mouse) 6-8 weeks _ 3000 * 10° 25 000 


4.2.1 Eukaryotic chromosomes 


Prokaryotes and eukaryotes differ significantly in how their genomes 
are organised. 


= In general, how does the organisation of the genome differ in prokaryotic 
and eukaryotic cells? 


Prokaryotic genomes are usually composed of a single circular DNA 
molecule (Section 3.3), while eukaryotes generally have much larger 
genomes consisting of multiple linear DNA molecules, each packaged 
with special proteins into individual chromosomes (Section 3.4.3). 


Multicellular eukaryotes are made up of two basic cell types which differ in 
their DNA content. The germ cells, which produce gametes (sperm and eggs 
(ova, singular ovum) in animals, and pollen and ova in plants) and the rest of 
the body cells, which are known as somatic cells. Gametes are haploid, that 
is, they contain a single copy of each type of chromosome. In human gametes, 
the haploid chromosome number (7) is 23. In sexually reproducing organisms, 
one male gamete and one female gamete fuse to form a single diploid cell 
(the zygote) with two copies of each chromosome (27). The diploid zygote 


undergoes multiple rounds of cell division to form a multicellular embryo 
which develops into the mature organism. Human diploid cells, i.e. all human 
somatic cells, thus have 46 chromosomes. 


Each diploid cell in a multicellular eukaryote therefore contains a pair of each 
type of chromosome, each pair consisting of one chromosome of maternal 
gamete origin and one chromosome of paternal gamete origin. In the case of 
diploid human cells there are 23 chromosome pairs (Figure 4.2a and b). 


which are present in both male and female humans. The pairs of autosomes 
are homologous chromosomes; they are of equal length and carry an 
equivalent set of genes. In addition, there is one pair of sex chromosomes or 
heterosomes which differ in males and females. The cells of human females 
have a homologous pair of X sex chromosomes (Figure 4.2a), while the cells 
of human males have one X and one Y chromosome (Figure 4.2b). 


Kew Yaga w 
sk Me a ax Wy WY Hi * Wy a 


BA OB AS OH ha by todd td 
13 1 15 


1 12 14 15 12 13° 14 

SS Hh 4K OBR KK OK Ohh ORR ORK 
16 17 18 19 20 16 Wi 18 19 20 
ak ba BY as ea 

21 22 XX 21 22 XY 


(a) (b) 


JX aN 


(c) (d) 


Figure 4.2 Karyotypes of human and Drosophila. (a) Human female; (b) human 
male; (c) D. melanogaster female; (d) D. melanogaster male. 


Twenty-two of these are pairs of non-sex chromosomes, known as autosomes, 


Chapter 4 Inheritance 


109 


Generating Diversity 


110 


The human Y chromosome is generally smaller and contains far fewer genes 
than the X. Sex is similarly determined by XY chromosomes in most other 
mammals and also some plants and insects, including Drosophila, which have 
three pairs of autosomes and one pair of either XX (female) or XY (male) 
heterosomes (Figure 4.2c and d). The characteristic number and appearance of 
the chromosomes in the nucleus of a cell from a particular species is called 
the karyotype (Figure 4.2). 


4.3 Cell division, chromosome segregation and 
genetic variation 


The cell’s DNA is replicated each time a cell divides, and the one copy is 
separated or segregated accurately into each daughter cell. Activity 4.1 
compares the different mechanisms of segregation in prokaryotic and 
eukaryotic cells. 


(LO 4.1) Allow 2 hours 

In this activity, you will learn how cells pass on their hereditary material, 
DNA, when they divide. Prokaryotic cells in general replicate their circular 
genome and divide asexually by binary fission to produce two identical 
daughter cells. 


The division of eukaryotic cells involves the accurate segregation of several 
separate chromosomes and is somewhat more complex, particularly in sexually 
reproducing organisms which generate both haploid and diploid cells. The 
activity describes the eukaryotic cell cycle and the stages of mitotic cell 
division and meiotic cell division in some detail. You will derive most benefit 
from this activity if you study it alongside the brief accounts of these 
processes provided in Section 4.3.1 below. 


4.3.1 Eukaryotic cell division 


There are two mechanisms of eukaryotic cell division, referred to as mitotic 
and meiotic cell division. The first described here is mitotic cell division, 
which generates two daughter cells with the same number of chromosomes as 
the original parent cell. 


Mitotic cell division 

In mitotic cell division, the chromosome pairs are first replicated (copied) and 
then the two sets are divided between two daughter cells. In general (though 
there are exceptions), the diploid somatic tissues of a multicellular eukaryotic 
organism are derived from successive mitotic divisions of the zygote, and then 
maintained by mitotic division of certain cells within the mature tissues. 


Activity 4.1 considered mitotic cell division in some detail, and it will only be 
summarised here. Eukaryotic cells undergo a mitotic cell cycle consisting of 


Chapter 4 Inheritance 


four distinct phases (Figure 4.3, middle). The first three phases, Gi (gap 1) 
phase, S (synthesis) phase and G2 (gap 2) phase, are collectively known as 
interphase. The cell grows during G1 phase, copies or replicates its 
chromosomes during S phase, then prepares itself for division during G2 
phase. 


M phase is generally regarded as consisting of two overlapping processes: 
mitosis, in which the two sets of chromosomes are segregated into two 
separate nuclei, and cytokinesis, in which the cytoplasm divides, separating 
the two daughter cells. Mitosis itself consists of four main phases (Figure 4.3, 
top) which can be readily distinguished by light microscopy. 


MITOSIS CYTOKINESIS 
1 


metaphase anaphase telophase 


mitotic 
spindle 


chromosome chromatid 


interphase 


CELL CYCLE 


Figure 4.3. The mitotic cell division cycle. The cycle consists of four main phases: G1 phase, S phase (during which 
the chromosomes are copied), G2 phase and M phase (during which the chromosome copies are segregated into two 
daughter cells). The stages of M phase (mitosis and cytokinesis) are shown. In this example, the cell shown at 
prophase has two pairs of homologous chromosomes (one long pair and one short pair). Each homologous pair 
consists of one chromosome of paternal origin (blue) and one of maternal origin (red). Because the chromosomes 
have replicated during interphase, each chromosome consists of a pair of identical chromatids. In metaphase, the 
chromosomes attach to the microtubules of the mitotic spindle and line up at the equator of the cell. In anaphase, the 
two chromatids of each chromosome separate and segregate to opposite sides of the cell. In telophase a nuclear 
membrane forms around each new set of chromosomes. Once mitosis is complete the cytoplasm divides (cytokinesis) 
to form two separate cells with two homologous chromosome pairs (one long and one short), the same as the parent 
cell. 


Generating Diversity 


112 


First, the chromosomes, which by this stage have replicated such that each 
chromosome consists of a pair of identical chromatids, condense and become 
visible in the light microscope; this stage of mitosis is known as prophase. 
The nuclear envelope also breaks down during prophase and the microtubules 
of the cell cytoskeleton (Section 3.4.2) start to reorganise to form the mitotic 
spindle. In metaphase, the chromosomes attach to the spindle fibres and line 
up along the centre, or equator, of the cell. During anaphase, the spindle 
microtubules shorten and the chromatids are separated and segregated equally 
into two groups at opposite ends of the cell. In telophase, a new nuclear 
membrane forms around each set of chromosomes. This is the end of mitosis. 
Finally, the cell cytoplasm divides, a process known as cytokinesis, separating 
the cell into two identical daughter cells, each with the same complement of 
homologous chromosome pairs as the parent cell. 


= Are the daughter cells of the mitosis shown in Figure 4.3 haploid or 
diploid? 


© The daughter cells are diploid, they contain two copies of each type of 
chromosome. 


Meiotic cell division 


Sexual reproduction requires the fusion of haploid gametes from two parents 
to form a diploid zygote cell, which therefore has a combination of genetic 
information from both parents. In order to ensure that the correct chromosome 
complement, or karyotype, is maintained in the zygote, each gamete must 
have half the zygotic chromosome complement, i.e. it must be haploid. 


Haploid gamete cells are generated by the division of special diploid germ 
line cells. These cells undergo a form of cell division known as meiosis. This 
is a reductional division, in which the diploid germ line cell replicates its 
chromosomes which are then segregated twice during meiosis to yield four 
haploid cells (instead of two diploid cells). A comparison of mitosis and 
meiosis is shown in Figure 4.4. 


In meiotic cell division, the diploid parent germ line cell replicates its 
chromosomes during interphase, before meiosis starts. Meiosis involves two 
consecutive nuclear divisions, called meiosis I and meiosis II (Figure 4.4b, 
and described in detail in Activity 4.1). Meiosis I yields two daughter cells, 
which each divide in meiosis II to yield a total of four haploid daughter cells. 


Despite the apparent similarity between mitosis and meiosis I (and indeed, 
there are a number of molecular components in common between the two), 
there are several features of meiosis I that are distinctive. A significant 
difference is that as the replicated chromosomes condense in prophase I of 
meiosis, the homologous chromosomes are actually physically paired together 
in a process known as synapsis, which involves a protein complex called the 
synaptonemal complex, and which relies on DNA sequence similarity between 
the paired homologous chromosomes (one maternally contributed and one 
paternally contributed). The matching DNA sequences of the synapsed 
homologous chromosomes are lined up exactly, which allows swapping of 
sections of homologous DNA sequence between the paired chromosomes 


Chapter 4 Inheritance 


MITOSIS MEIOSIS 


prophase II 


metaphase II 


metaphase 
[ 
anaphase anaphase | anaphase I! 
2n 2n n n n n 
telophase telophase | telophase II 
and cytokinesis and cytokinesis and cytokinesis 
diploid daughter cells haploid gametes 


(a) of mitosis (b) 


Figure 4.4 A comparison of mitosis and meiosis. This representation shows a cell that has replicated its DNA and is 
now at prophase (top). As in Figure 4.3, the cell has two pairs of homologous chromosomes (one long pair and one 
short pair) and each chromosome is composed of a pair of chromatids. (a) Mitosis yields two diploid daughter cells, 
each with the same complement of homologous chromosome pairs as the parent cell. (b) In contrast, during meiosis, 
the homologous chromosomes segregate during meiosis I, but the two chromatids composing each chromosome 
remain attached together. The two chromatids are finally separated during meiosis II to yield four haploid daughter 
cells (gametes), each of which has a single copy of each chromosome. In this example, recombination (crossing over) 
has taken place between homologous chromatids in each chromosome pair at prophase I. As a consequence, some of 
the resulting haploid gametes have chromosomes carrying new combinations of parental (male and female) gene 
variants. 


(Figure 4.4b). This phenomenon is known as recombination or crossing over, 
a process in which the DNA helix in each chromosome is broken and rejoined 
to the homologous chromosome. Crossing over therefore creates chromosomes 
with new combinations of maternal and paternal alleles (gene variants) in the 
cells resulting from meiosis I. 


By the end of prophase I, the synaptonemal complex has disassembled, and 
the homologues are only held together by their crossovers. During metaphase 


113 


Generating Diversity 


114 


I, the still attached homologous pairs of chromosomes are aligned at the 
metaphase plate (unlike mitotic metaphase, where individual duplicated 
chromosomes are aligned at the metaphase plate (compare Figure 4.4a and b). 
You will recall from Activity 4.1 that meiosis can have more than one 
outcome because the homologous pairs of chromosomes can align in random 
orientations with respect to each other at metaphase, which determines which 
cell they will segregate into at anaphase. Figure 4.4b shows only one of the 
possible outcomes of meiosis in this example of a cell with two homologous 
chromosome pairs. If the chromosomes had been oriented differently at 
metaphase I and metaphase II, then the combinations of chromosomes 
received by the final gametes would be different. This random segregation of 
chromosomes is known as independent assortment of chromosomes 
(Activity 4.1). 


During anaphase I, the homologous chromosomes separate and move to 
opposite ends of the cell. This is in contrast to mitotic anaphase, when 
individual sister chromatids segregate. You can see in Figure 4.4b that by the 
end of telophase I and cytokinesis there are two diploid cells, but, unlike the 
daughter cells produced by mitosis, these cells are not genetically identical. 


Following the completion of meiosis I, the diploid cells undergo meiosis II in 
which the chromosomes align on the metaphase plate (again in random 
orientations) and the individual sister chromatids finally segregate during 
anaphase II, ultimately yielding four haploid cells (gametes) with a single 
copy of each type of chromosome (Figure 4.4b). 


The occurrence of crossing over during meiosis is nearly universal in sexually 
reproducing organisms, and each chromosome usually undergoes at least one 
crossover event during meiosis. Thus, recombination between homologous 
chromosome in meiosis | and independent assortment of chromosomes during 
anaphase I and anaphase II ensures the production of gametes with a large 
number of different genotypes. In organisms such as humans (with 23 
chromosome pairs), the number of possibilities for genetically different 
gametes during meiosis is almost infinite (Activity 4.1). 


During sexual reproduction, one haploid gamete fuses with another (usually 
from another individual, but this is not the case in all species) to form the 
diploid zygote, introducing yet more new genetic combinations. Sexual 
reproduction is therefore a highly efficient mechanism for creating genetic 
variation, and consequently phenotypic variation. 


4.3.2 Sexual life cycles 


These two forms of cell division (mitotic and meiotic cell division) mean that 
sexually reproducing eukaryotic organisms can effectively alternate between 
haploid and diploid phases during their life cycles, thereby increasing genetic 
diversity. Figure 4.5 shows the relationship between these phases. Most 
animals, including humans, spend the majority of their life cycle as diploid 
(2n) organisms (Figure 4.5a); the gametes are the only haploid (n) cells. The 
gametes don’t divide but fuse to form a diploid (2) zygote which divides by 
mitosis to produce the multicellular offspring. 


multicellular 
animal (2n) 


FUSION 


(a) (b) (c) 


Chapter 4 Inheritance 


Figure 4.5 Sexually reproducing organisms alternate between haploid and diploid phases in their life cycles. (a) In 
animals, the zygote undergoes mitosis. Haploid gametes are produced by meiosis and don’t divide further, but fuse to 
form a zygote. (b) In plants (alternation of generations), spores are produced by meiosis but both the spores and 
zygote can undergo mitosis, so there can be two multicellular stages. (c) In most fungi, the zygote immediately 
undergoes meiosis to form spores which may then undergo mitosis to form either a unicellular or multicellular 


organism. 


Plants, and some algae, have a second type of life cycle called alternation of 
generations (Figure 4.5b) which has two multicellular stages, one haploid and 
one diploid. The multicellular diploid stage is called the sporophyte, which 
can undergo meiosis to produce haploid spores that develop by mitosis into a 
multicellular haploid gametophyte stage. Haploid gametes formed within the 
gametophyte can fuse to form a new diploid zygote, which grows into the 
multicellular sporophyte by mitosis. Different types of plants and algae, 
however, show great diversity in the balance between these haploid and 
diploid phases. Some algae have distinct sporophyte and gametophyte forms, 
while mosses exist as haploid gametophytes with only a brief sporophyte 
phase. Higher plants, such as the angiosperms (flowering plants), spend most 
of their life cycle as a diploid sporophyte which forms gametes (pollen and 
ova) by meiosis (the gametophyte is reduced to a rudimentary structure 
consisting of just a few cells within the plant). Most fungi (and some protists) 
have a third type of alternating life cycle (Figure 4.5c) in which haploid 
gametes fuse to form a zygote, which is the only diploid phase. The zygote 
then undergoes meiosis to produce haploid cells which divide by mitosis to 
form a haploid unicellular (e.g. yeasts) or multicellular adult, depending on 
the species (Section 2.4.2). The haploid adult produces haploid gametes by 
mitosis. You will note that both haploid and diploid cells may divide by 
mitosis, depending on the type of life cycle. However, only diploid cells ever 
undergo meiosis, the reductive form of cell division. 


This all sounds very complex, but the take-home message is that in all of 
these types of sexual life cycle, the advantage of having alternating phases of 
meiosis and fertilisation (fusion of gametes) is that it contributes to genetic 
variation among offspring because they receive new combinations of the 


WS 


Generating Diversity 


116 


genetic material from two parents. The rest of this chapter considers how 
patterns of variation are inherited in sexually reproducing diploid organisms 
by the formation of haploid gametes that fuse to ultimately form the diploid 
adult organism. 


Summary of Sections 4.1 to 4.3 


« Variation among individuals of a species results from both heritable and 
environmental factors. 


* The genome of a eukaryote comprises multiple linear DNA molecules 
packaged into chromosomes: diploid cells have two copies of each 
chromosome (one of paternal origin and one of maternal origin), while 
haploid cells carry one copy of each chromosome. 


* Mitotic cell division yields two daughter cells, each with the same 
chromosome complement as the parent cell. 


* In contrast, meiotic cell division yields four haploid daughter cells with 
unique genotypes due to the random assortment of chromosomes and 
genetic recombination (crossing over). 


¢ Sexually reproducing organisms alternate between haploid and diploid 
phases in their life cycle, which has the benefit of increasing genetic 
variation among their offspring. 


4.4 Mendelian genetics 


The modern science of genetics has its roots in the pioneering work of Gregor 
Mendel (Figure 4.6a) in the 19th century, and hinges on the observation that 
many of the physical characteristics observed in individuals of a species are 
inherited in predictable ways. Mendel used the pea plant, Pisum sativum, as 
his model organism (Figure 4.6b), but the principles of inheritance that his 
work established can be applied to any diploid organisms that undergo sexual 
reproduction. Mendel’s key insight was that the heritability of physical 
characteristics depended on discrete factors originating from both parents. His 
insight was distinct from the then current belief that inheritance operated 
through a ‘blending’ of characteristics from both parents. 


As you will see, Mendel was able to conduct his research because he had 
identified gene variants that had visible effects on the appearance of the pea 
plants. Such gene variants are today known as alleles. Every diploid cell of an 
individual organism has two copies of each chromosome, and therefore two 
copies of each gene. Each cell may therefore carry two copies of the same 
gene allele or two different alleles (across the entire population of the species 
there may be many different alleles of a particular gene). When both alleles of 
a gene in a diploid organism are identical, the organism is said to be 
homozygous for that gene (or gene Jocus, plural loci, meaning ‘place’ or 
‘position’ in the genome) and when the two alleles are different, the organism 
is said to be heterozygous for that gene. 


So tae 
: i Tab ANH 


(a) 


lina fo enon fore 
(b) 


Figure 4.6 (a) The Austrian monk Gregor Mendel conducted the first genetic 
experiments, with peas. (b) Some phenotypic variants in the pea (Pisum sativum) 
of the kind used by Mendel in his experiments. The illustration shows flower and 
pea colour variants. 


4.4.1 Heritable traits and dominant and recessive alleles 


Mendel’s experiments with pea plants rested on the observation that variation 
in several visible characteristics of different strains of pea plants that he kept 
in cultivation was heritable. For example, some of Mendel’s experiments 
concerned the inheritance of violet or white petal colouration. 


Mendel started with pure-breeding pea plants, i.e. those which, when self- 
crossed, always yield progeny of the same phenotype as the parental plants. 
For example, when he self-crossed a strain of pure-breeding violet-petalled 
plants he obtained only violet-petalled progeny, while self-crossing pure- 
breeding white-petalled plants yielded only white-petalled progeny. When he 
crossed pure-breeding violet-petalled plants with pure-breeding white-petalled 
plants, however, he found that all of the progeny plants (known as the F, 
generation) had violet petals: their flowers were indistinguishable from those 
of the parental violet-flowered plant. 


= If inheritance occurred by a ‘blending’ of characteristics from both 
parents, what might you expect the petal colour of the progeny from the 
above cross to be? 


Chapter 4 Inheritance 


117 


Generating Diversity 


In some situations, different 
alleles can all contribute to the 
phenotype, and one is not 
dominant over the other. This is 
known as codominance; a well- 
known example is the ABO 
blood group phenotype 

in humans. 


118 


© If blending inheritance were to occur, you might expect the progeny 
plants to have pale violet petals (mid-way between violet and white). 


This form of inheritance, observed by Mendel in pea plants, is an example of 
discontinuous variation. In other words, the phenotypes fall into distinct 
classes, in this case either violet or white petals; there is nothing in between. 
Mendel drew a number of conclusions from this observation: 


* that heredity was due to specific and discrete factors (in contrast to the 
prevailing view of the time, which favoured ‘blending’); these discrete 
factors are now better known as genes 


¢ that each plant possessed two copies of these factors 


e that there were different versions of these factors (alleles), and that some 
alleles are dominant over others, which are therefore said to be recessive. 


Mendel deduced from the experiment described above that there was a factor 
(now known as a gene) for petal colour that existed in two variants (alleles), 
one allele for white petal colour and one allele for violet petal colour. 


= Is the violet petal colour allele dominant or recessive to the white petal 
colour allele in the cross described above? 


© All the F; progeny of a cross between pure-breeding violet-petalled and 
pure-breeding white-petalled plants have violet flowers, so the violet petal 
colour allele is dominant. 


By thinking about the cross using simple genetic terminology, you can see the 
basis behind Mendel’s conclusions (Figure 4.7a). Using the notation V (in 
upper case) to signify the dominant allele responsible for violet petals and v 
(in lower case) to signify the recessive allele that confers white petals, the 
genotypes of the two true-breeding parents are denoted V/V and v/v, with the 
corresponding phenotypes of violet and white petals, respectively. The slash 
‘/’ symbol is used to separate the two alleles carried by a homologous 
chromosome pair (Box 4.2). 


= What term is used to describe the genotype of a plant that possesses two 
copies of the same allele of the gene for petal colour, for example V/V? 


© The plant is described as homozygous, in this case for the V allele. 


Recall that in flowering plants, each diploid parent produces haploid gametes 
(pollen or ova) by meiosis. Two haploid gametes then fuse to form the diploid 
offspring in each generation. In this petal colour example, the true-breeding 
violet-petalled (V/V) parent can only produce gametes carrying the dominant V 
allele, while the true-breeding white-petalled (v/v) parent can only produce 
gametes carrying the recessive v allele (Figure 4.7b). When the two plant 
strains are crossed, a gamete from one plant bearing the V allele will combine 
with a gamete from the other plant bearing the v allele, yielding a 
heterozygous diploid V/v zygote (Figure 4.7b). Because the V allele is 
dominant, all the V/v offspring will have violet petals. Note that this example 
only concerns two allelic versions of a single gene; it ignores all other genes, 
which may or may not be identical between the two pea strains. 


violet flower white flower 


iF 
4g g 
a ae 
Ss & 
F; offspring ale 


violet flower 


violet flower white flower 


WV x wv 


My 4 


allv 


parents 


gametes all V 


Ss &# 


F, offspring V/v 


violet flower 


(a) (b) 


Figure 4.7 (a) The cross between pure-breeding violet-petalled pea plants 
(genotype V/V) and pure-breeding white-petalled pea plants (genotype v/v) yields 
only violet-petalled F, offspring of genotype V/. (b) A diagram showing the 
inheritance of petal colour alleles and the chromosomes on which they are carried 
during gamete fertilisation. 


To conduct an experiment like this, Mendel would have had to ensure that 
pollen was shared only between plants of the two parent strains. To achieve 
this, plant geneticists typically remove the anthers (the flower structures that 
bear pollen) from one of the strains (thereby preventing self-fertilisation), and 
then manually pollinate the remaining stigma (part of the female reproductive 
structure) with pollen from the other strain. 


Box 4.2 A note on genetic nomenclature 


Historically, genes were frequently named on the basis of an unusual 
phenotype caused by a gene mutation, a change in the DNA sequence of 
the gene (Chapter 5). For example, the ‘wild type’ form of the popular 
genetic model organism Drosophila melanogaster, the fruit fly, has brick- 
red eyes. One of the first Drosophila mutations to be identified (by 
Thomas Hunt Morgan in 1910) caused the fly to develop white eyes, and 
was named ‘white’, abbreviated as w (note that the names and 
abbreviations for genes are conventionally written in italics). It was later 
discovered that this mutation causes white eyes because it inactivates the 
normal function of the white gene, which is (rather counter-intuitively) to 
ensure that the eyes contain red pigment. This tradition of naming genes 
to reflect a mutant phenotype has continued to the present day, though 


Chapter 4 Inheritance 


Wild type can be regarded as the 
‘normal’ phenotype, or more 
accurately the phenotype that 
most typically occurs in nature. 


119 


Generating Diversity 


120 


many of the genes more recently discovered by genome DNA sequencing 
(for which mutant alleles have not been found) have been given more 
arbitrary names. As genetic research has developed over the last century, 
researchers using different model systems developed distinct and 
different systems of genetic nomenclature, including how gene names are 
abbreviated and how allelic forms of a gene are written down. This can 
be very confusing to the beginner, and indeed to the experienced 
geneticist. 


Typically, genotypes are written down with the genotype of each 
chromosome of a homologous chromosome pair separated by a slash, ‘/’. 
Only those genes relevant to the discussion will be included. For 
example, consider the genotype of a female fruit fly (with two X 
chromosomes, Figure 4.2c) that is heterozygous for wild type and mutant 
alleles of two genes called white (w) and singed (sn) that are both on the 
X chromosome (white mutations render the eyes white instead of brick 
red, while singed makes the fly’s bristles twisted in shape), The genotype 
of this fly can be written as: 

w* sn /w!sn! 

where the ‘+’ superscript indicates the wild type allele and the ‘1’ 
superscript indicates a particular type of mutant allele (because often 
many alternative mutant alleles exist, which are each given their own 
number code). So, in this fly, one of the X chromosome homologues 


carries the wild type allele of each gene (w sn”), while the other 
homologue carries a mutant allele of each gene (w’sn’). 


Later on in the chapter, the inheritance pattern of two genes on different 
chromosomes is discussed. The genotypes of two different chromosomes 
are conventionally shown separated by a semicolon. In the example 
shown below, the genotype indicates a Drosophila that is heterozygous 
for two genes that are on different chromosomes; the white gene (w) is 
on the X chromosome (as described above) and the ebony gene (e) is on 
chromosome 3. Ebony mutations make the fly’s body very dark in 
colour: 


w/w! ; e*/e! 

So in this fly, the homologous pair of X chromosomes carry one wild 
type and one mutant white allele, while the homologous chromosome 3 
pair carries one wild type and one mutant ebony allele. Since the wild- 
type allele of each gene is dominant, this fly will have red eyes and a 


pale body. In this chapter, gene terminology will be explained before 
each example. 


Mendel went on to cross his violet-petalled F, progeny pea plants (which have 
the genotype V//, as established above) with each other. He observed that their 
progeny (the F, generation) contained both violet-petalled and white-petalled 
plants (Figure 4.8). Recall that as a result of meiosis, V/v plants will produce 
haploid gametes with two different genotypes, either V or v, at equal 
frequency, i.e. about equal numbers of each type of gamete will be produced 
(Figure 4.9). 


Figure 4.8 When F;, plants from the cross shown in Figure 4.7 are crossed 
together (or allowed to self-fertilise), the F, generation will contain both violet- 
petalled plants (with genotype V/V or V/v) and recessive phenotype white-petalled 
plants (with genotype v/v). 


The best method of working out how the phenotypes of the F2 progeny arise 
from crossing V/v genotype individuals from the F, generation is to use a 
Punnett square (named after the pioneering British geneticist Reginald C. 
Punnett). In a Punnett square, the genotypes of the gametes produced by each 
parent are indicated along two sides of the square (Table 4.2), and the progeny 
formed by the fusion of those gametes are deduced by cross-referencing pairs 
of gametes to fill in each section of the grid. So for the cross V/v x V/, the 
Punnett square shown in Table 4.2 can be generated. 


Chapter 4 Inheritance 


121 


Generating Diversity 


122 


metaphase II 


Figure 4.9 A plant of genotype V/v can produce two types of gamete. Here the 
blue chromosome of the homologous pair carries the V allele, while the red 
chromosome carries the y allele. Haploid V and v gametes are produced at equal 
frequencies. 


Table 4.2 A simple Punnett square to show Fz 
progeny that result from a cross between F, pea 
plants that are heterozygous (V/v) for petal colour. 
The genotypes of the gametes produced by the two 
parents are shown along the left side and along the 
top of the square. The genotype and phenotype of 
each possible combination of alleles in the F2 
progeny is indicated. 


By using the Punnett square, you can easily work out the proportions of 
violet-petalled and white-petalled plants that will result from this cross. 


= What proportion of the F2 progeny will be violet-petalled plants? 


4 Three-quarters (75%) of the F> progeny carry at least one copy of the 
dominant V allele, and will therefore have violet petals. 


The ratio is therefore 3: 1, three violet-petalled plants to one white-petalled 
plant. 


= What proportion of the F2 progeny will be pure-breeding violet-petalled 
plants? 


© Only one-quarter (25%) of the F2 progeny are /’/V and therefore pure- 
breeding violet-petalled plants. 


4.4.2 Independent assortment of genes on different 
chromosomes 


The simple example of Mendelian inheritance described in the preceding 
section concerns two alleles of a single gene at a single locus in the genome. 
You may wonder what the inheritance patterns of nvo genes affecting separate 
characteristics might be. The following example introduces two new 
phenotypes, each caused by a single gene variant. 


The seeds (the peas) produced by pea plants may be green or yellow in colour 
and they may be round or wrinkled in shape (this is due to the morphology of 
the two cotyledons, or seed leaves, which make up the bulk of the pea seed). 
These characteristics of peas are determined by two genes that lie on different 
chromosomes. In this example, R is used to indicate the dominant allele that 
leads to seeds with a round shape and r to indicate the recessive allele that 
leads to wrinkled seeds. Similarly, Y denotes the dominant seed colour allele 
leading to yellow seeds and y denotes the recessive allele that leads to green 
seeds. You could therefore represent the genotype of a plant heterozygous at 
both gene loci as follows: 


Rr; Yi 


Recall that the genotypes of the two different chromosomes are separated by 
the semicolon, and the genotypes of the two chromosomes of the homologous 
pair are separated by a slash, ‘/’ (Box 4.2). 


= What phenotype would seeds with the genotype R/r; Y/y have? 


© These seeds are heterozygous for R and r. R denotes the round seed 
allele, which is dominant over r (the wrinkled seed allele), so these seeds 
will be round. Similarly, the seeds are heterozygous for Y and y; since the 
yellow allele Y is dominant over the green allele y, the seeds will be 
yellow. R/r ; ¥/y seeds will all be round and yellow. 


During meiosis (Section 4.3.1 and Activity 4.1), the chromosomes are 
randomly oriented at metaphase and independently assorted into gametes. In a 
plant of genotype R/r; ¥/y there are therefore two possible outcomes for the 
independent assortment of these gene alleles during the first meiotic division. 


Chapter 4 Inheritance 


123 


Generating Diversity 


An R/r; Y/y plant can therefore generate gametes with four different possible 
genotypes, as shown in Figure 4.10. 


= What are the different gamete genotypes you would expect to be 
produced by R/r; Y/ plants? 


© The plants are heterozygous for alleles of both genes. The gametes may 
be R: ¥, R;y, r; ¥% or r;y, so there are four possible gamete genotypes. 


OUTCOME 1 OUTCOME 2 


metaphase | 


© metaphase II © © 
Ray RY ny ny ny rey Ry Riy 


Figure 4.10 A plant of genotype R/r; YA’ can produce four different gamete genotypes. In this example, the 
dominant alleles R and Y are carried on the blue chromosomes, while the recessive alleles r and y are carried on the 
red chromosomes. Segregation at meiosis I depends on the orientation of the chromosomes, at metaphase I. For 
example, both red chromosomes could segregate to the same cell (outcome 1) or to different cells (outcome 2). Both 
outcomes are equally likely, so the four different gamete genotypes are produced at equal frequencies. 


These four classes of gamete arise at equal frequency; that is, the parent plant 
is equally likely to produce all the possible gamete genotypes. You can again 
use a Punnett square to see how many distinct genotypes will be present in 
the progeny of a cross between R/r; Y/y plants (Table 4.3). Making the 
assumption that all four gamete genotypes occur at the same frequency 
permits you to predict the frequency of each progeny genotype. You will also 
note that with four different gamete genotypes from each parent to consider, 
the table is rather larger than the example shown earlier. 


124 


Table 4.3 A Punnett square illustrating the result of a cross between two pea plants that 
are heterozygous for two genes on separate chromosomes. 


rsy 


Each parent can produce four different gamete genotypes, so there are 

4 x 4 = 16 possible combinations of gametes. Looking carefully at Table 4.3, 
you may have noticed that not all 16 gamete combinations yield unique 
genotypes. 


= How many distinct genotypes are there in the Punnett square above? 
9 There are nine distinct genotypes. 


Not all of these nine genotypes correspond to unique phenotypes. For example 
R/R ; Y/Y corresponds to a phenotype of round, yellow seeds, as does R/R ; Y/y. 
This means that of the 16 boxes in the Punnett square in Table 4.3, there are 
only nine distinct genotypes, and four different phenotypes. The four 
phenotypes are indicated in the Punnett square by differential shading. 


= If you conducted a cross such as that described above, what would be the 
expected ratio of each of the four phenotypes among the progeny? 


© The round and yellow phenotype occurs in nine of the 16 boxes, round 
and green in three of the boxes, wrinkled and yellow in three of the 
boxes and the wrinkled and green occurs in only one of the boxes. The 
expected ratio of the four phenotypes would be 9:3:3:1. 


The phenotype ratio 9:3:3:1 is characteristic of a heterozygous cross 
involving two genes that assort independently. Independent assortment is a 
characteristic of two genes located on different chromosomes. How does the 
inheritance pattern differ if two genes are located on the same chromosome? 


4.4.3, Linkage and recombination 


Two genes that are located on the same chromosome are said to be linked; 
that is, they are unable to assort independently. Linkage was first recognised 
in crosses that deviated from the characteristic Mendelian heterozygous cross 
ratios of 9:3:3:1 described above. This will be illustrated with an example 
of two linked genes: one is a different gene that affects flower colour (which 
has an allele for purple flowers, P, which is dominant to an allele for red 


Chapter 4 Inheritance 


125 


Generating Diversity 


126 


flowers, p); and the other is a gene that affects pollen grain shape (which has 
an allele for long pollen grains, L, which is dominant to an allele for short 
round pollen grains, /). 


When pure-breeding purple-flowered, long-pollen plants (PL/PL) are crossed 
with pure-breeding red-flowered, short-pollen plants (p//p/), all the progeny 
are heterozygous at both loci, as shown in Table 4.4 below. 


Table 4.4 A Punnett square showing the cross 
between pure-breeding purple-flowered, long-pollen 
pea plants (PL/PL) and pure-breeding red-flowered, 
short-pollen pea plants (p//pl). The F, progeny all 
have genotype PL/pl. 


Compare the notation used here with that used in the previous example 
(Section 4.4.2). Here, because both genes (P or p and L or /) lie on the same 
chromosome, they are shown paired together, and the two homologous 
chromosomes are separated by a ‘/’. 


Since the two loci are linked by being on the same chromosome, the pure- 
breeding parents can produce only one type of gamete (one parent produces 
only PL gametes, while the other produces only p/ gametes). The cross 
between pure-breeding purple-flowered, long-pollen plants (PL/PL) and red- 
flowered, short-pollen plants (p//p/) therefore yields an F, generation in which 
all individuals are of the genotype PL/pl. 


= What is the phenotype of plants of genotype PL/p/? 


© Plants of this genotype will have purple flowers with long pollen grains. 


All the plants in the F, generation are heterozygous at both loci, and display 
the phenotype corresponding to the dominant allele at each locus; that is, they 
will have purple flowers and long pollen grains. 


What gametes might the F; progeny plants produce? This time, you need to 
take into account the possibility of recombination (crossing over) occurring 
between homologous chromosomes during meiosis (Section 4.3.1), Note that 
recombination will also have occurred during meiosis in the pure-breeding 
(homozygous) PL/PL and pl/p! parents of the F; generation, but since the 
homologous chromosomes are the same in that case, no new combinations of 
alleles would result, so there is no need to worry about recombination in a 
homozygous plant. The F, plants in the cross shown in Table 4.4 are all, 
however, heterozygous, PL/pl. If there were no recombination during meiosis 
you would expect them to produce gametes with two different genotypes, 
either PL or pl (Table 4.5). 


Chapter 4 Inheritance 


Table 4.5 A Punnett square showing the expected 
result of a cross between F, pea plants with 
genotype (PL/pl) assuming no recombination 
during meiosis. 


PLipl 
purple flower, purple flower, 
long pollen long pollen 


plipl 
purple flower, red flower, 
long pollen short pollen 


If this were the case, you would expect to see F, offspring with two 
phenotypes: purple-flowered, long-pollen grain plants (PL/PL or PL/pl) and 
red-flowered, short-pollen grain plants (p//p/) in the ratio 3 : 1. However, if 
recombination between homologous chromosomes has taken place during 
meiosis to produce the F, gametes, in addition to the gamete genotypes shown 
in Table 4.5, you would expect the production of gametes with some new 
gene combinations, including P/ and pL gametes (Figure 4.11). The gametes 
with these new genotypes are known as recombinants. 


In order to discover if recombination has indeed taken place between the two 
gene loci during meiosis in the F, plants, it is necessary to conduct a cross 
between the PL/p/ plants of the F; generation and plants of the homozygous 
genotype pl/pl, as shown in the Punnett square in Table 4.6. The homozygous 
pl/p! plants can only generate a single gamete genotype (p/) carrying recessive 
alleles at both loci, thus the frequency of each phenotype among the F> 
offspring of the cross will reveal the frequency of the four different gamete 
genotypes (non-recombinant and recombinant) produced by the PL/p/ parent. 
Table 4.7 shows some representative phenotype data from such a cross. 


Table 4.6 A Punnett square to show the results of a cross between an F, generation 
PL/pl heterozygote and a pl/pl homozygote pea plant. 


Non-recombinant gametes | Recombinant gametes from 
from F, PL/p! plants F, PLip! plants 


Gametes from plipl Pipl pLipl 
the p/p! plant purple flower, | red flower, | purple flower, | red flower, 
long pollen short pollen | short pollen long pollen 


127 


Generating Diversity 


prophase | 


recombination 
during prophase | 


gametes 


Figure 4.11 Recombination (crossing over) between homologous chromosomes 
during meiosis results in new combinations of linked gene alleles. In this example, 
the genes for flower colour and pollen shape lie on the same chromosome. The 
linked alleles PL are carried on one of the homologues (blue) and the linked 
alleles p/ on the other homologue (red). Note that some of the gametes resulting 
from meiosis of this cell will have new combinations of the linked genes (pL and 
Pl) due to recombination between the red and blue homologues. 


Table 4.7 Representative data showing numbers and frequency (in brackets) of 
different phenotypes among the F> offspring resulting from a cross between the F, 
generation PL/p! heterozygote and the pl/p/ homozygote shown in Table 4.6. 


Phenotype Plant genotype Number of plants (%) 
(a) purple flower, long pollen PLipl 120 (39.5%) 

(b) purple flower, short pollen Pipl 33 (10.9%) 

(c) red flower, long pollen pL/pl 36 (11.8%) 

(d) red flower, short pollen pl/pl 115 (37.8%) 

Total 304 (100%) 


Of the four offspring phenotypes shown in Table 4.7, (a) and (d) correspond 
to plants formed from fusion between gametes in which recombination did not 
occur, while phenotypes (b) and (c) correspond to gametes in which 
recombination did occur between the two gene loci. The recombinant gametes 
occur at a much lower frequency than non-recombinants. The percentage of 
the different phenotypes can be used to estimate the frequency at which 
recombination has occurred between the two gene loci — this is the 
recombination frequency. The recombination frequency between these two 
gene loci, expressed as a percentage, is the total number of recombinant 
phenotypes ((b) plus (c) in Table 4.7) divided by the total progeny, multiplied 
by 100: 


(33 + 36) 


carries 100% = 22.7% 


The recombination frequency for these two gene loci is therefore 22.7%. As 
you will see in the next section, this value can be used to work out the 
distance between genes and prepare maps of the relative locations of genes on 
a chromosome. 


In fact, the general mechanism of DNA recombination in terms of breaking 
and rejoining DNA molecules is not restricted to meiosis, but is quite 
widespread in a number of situations; it is utilised by bacteria during their 
own form of sexual reproduction (although normally most bacteria divide 
asexually by binary fission, Activity 4.1), and in the repair of damaged DNA 
(Chapter 5). 


4.5 Recombination frequencies and genetic maps 


In 1913, Alfred Sturtevant, then an undergraduate student in T. H. Morgan’s 
lab (in which the fruit fly Drosophila melanogaster was first adopted for 
genetic research), realised that the frequency of recombination between linked 
genes reflected the physical distance between them on the chromosome; in 
other words, the greater the distance between linked genes, the greater the 
chance that homologous chromosomes would cross over in the region between 
the genes during meiosis. The recombination frequency between two distant 
gene loci is therefore much higher than that between two nearby gene loci. 
Sturtevant used this insight to generate the first genetic map showing the order 
of several genes on one of the four chromosomes of Drosophila. This map 


Chapter 4 Inheritance 


129 


Generating Diversity 


was derived by measuring the recombination frequencies between pairs of 
gene alleles on the X chromosome (Figure 4.12). 


Oo 
BC ER M 
0.0 1.0 30.7 33.7 57.6 


(b) 


Figure 4.12 (a) The first genetic map, constructed by Alfred Sturtevant and 
published in 1913, of the X chromosome in Drosophila. Note that this publication 
predated the adoption of current genetic nomenclature in Drosophila. (b) Alfred 
Sturtevant in World War I uniform. 


In the map shown in Figure 4.12, two of the alleles mapped were inseparable 
by recombination (O and C). These turned out to be two different mutant 
alleles of the same gene, now referred to as white (Box 4.2). The scale on the 
map shows a numeric value of recombination frequency. One recombination 
map unit corresponds to a recombination frequency of 1%, and is often 
referred to as a centimorgan, in honour of T. H. Morgan, the father of 
Drosophila genetics. 


Recombination map distances are, broadly speaking, additive; so, for example, 
in Figure 4.12, you might expect the distance between P and M to be the sum 
of the distances between P and R and R and M. However, this relationship 
starts to break down as the loci being examined get further apart. 


= Why might recombination map distances not be additive over longer 
distances? 


© Because, if the interval between the two loci is very large, there is a 
greater possibility of more than one crossover occurring during meiosis, 


The recombination maps of Drosophila improved rapidly. By 1915 the 
Drosophila genetic map consisted of 36 genes and spanned all four 
chromosomes; by 1925, 112 Drosophila genes had been mapped. The advent 
of genetic mapping (or recombination mapping) brought with it a rapid growth 
in genetics research as investigators developed recombination-based maps of a 
variety of species. Such maps have been a fundamental tool in the genetic 
analysis of model organisms, and have practical applications in human health 
and animal breeding. 


In the present day, genomic DNA sequencing (Chapter 5) has in many ways 
eclipsed the importance of classical genetic mapping. Complete genomic DNA 
sequences have permitted the assembly of detailed gene maps of an increasing 


130 


number of species. In the case of Drosophila melanogaster, genomic 
sequencing has revealed the total gene number to be between 13 500 and 

14 000 genes. As you will see in the next chapter, however, there has been a 
revival in the development of high-throughput gene mapping techniques, 
driven by research into the genetic variation between individuals. 


4.6 Sex-linked genes 


So far, this chapter has considered Mendelian segregation patterns for which 
two copies of each gene are present, one on each of the two chromosomes in 
a homologous pair, as is typical for diploid organisms. As explained in 
Section 4.3, the human karyotype (Figure 4.2) comprises 46 chromosomes: 22 
homologous pairs of autosomes, which are found in both males and females; 
plus one pair of chromosomes that differ in males and females, known as 
heterosomes, or sex chromosomes. The 22 autosomes are numbered | to 22, 
while the heterosomes are referred to as the X and the Y chromosomes. The 
heterosomes are the main determinant of sex: for example, in both humans 
and flies, females possess two X chromosomes, while males possess one X 
chromosome and one Y chromosome. For species such as these, the males are 
said to be the heterogametic sex (in some species, for example birds, females 
are the heterogametic sex, and the sex chromosomes are named W and Z). In 
the case of the human XY chromosome pair, the Y chromosome usually 
contains very few genes. This has important consequences for the inheritance 
patterns of genes located on the X chromosome, as you will see later. 


Sex linkage can be illustrated with the example of the inheritance of the white 
gene which is on the X chromosome in the Drosophila fruit fly. Recall that 
white affects the pigmentation of the eye (Box 4.2). Wild type flies have 
brick-red eyes, while flies mutant for white have white eyes (Figure 4.13). As 
is the case for humans, female Drosophila have two X chromosomes and 
therefore two copies of each X chromosomal gene, including white, while 
males have one X chromosome and one Y chromosome (which contains 
hardly any genes). Males therefore have only a single copy of most X 
chromosome genes, including white. 


Figure 4.13 The heads of white-eyed and wild type Drosophila. Left: the 
appearance of a white-eyed mutant fly; right: the normal (wild type) eye colour. 


Chapter 4 Inheritance 


131 


Generating Diversity 


132 


Recall that w* and w’ indicate wild type and mutant alleles of the white gene, 
respectively (Box 4.2), and that w” is dominant to wi. 


= What would be the phenotype of a w*/w’ fly? What sex would this fly 
be? 


© The fly has one wild type (w’) and one mutant (w’) allele of white. The 
wild type allele is dominant, so this fly would have brick-red coloured 
eyes. The fly has two white alleles and therefore must have two X 
chromosomes, so it will be female. 


= What must be the genotype of a white-eyed female fly? 


© The (w’) mutant allele is recessive, so a white-eyed female must be 
homozygous for two recessive alleles; that is, the genotype must be 
NS 
wiw. 


In genetic nomenclature, Y represents the Y chromosome, which pairs with 
the X chromosome during meiotic cell division, but does not carry a copy of 
the white gene. 


= What must be the genotype of a red-eyed male fly? 


© The male only has one X chromosome, so this must carry the wild type 
allele (w’). The Y chromosome doesn’t carry the white gene, so the 
genotype is w'/Y. 


Consider the following cross between pure-breeding white-eyed female flies 
and male flies with wild type, brick-red eyes: 
female w!/w! = male w/Y 


As was done earlier in this chapter, a Punnett square (Table 4.8) can be used 
to predict the progeny from this cross. 


Table 4.8 A Punnett square showing sex-linked 
inheritance of Drosophila eye colour in a cross 
between a white-eyed female and a red-eyed male. 


red-eyed female | white-eyed male 


Since the female parents are from a pure-breeding white-eyed stock, you know 
that they must be homozygous for w’, and can only produce gametes (eggs) 
bearing the w’ allele. The males, on the other hand, may produce sperm 
bearing either an X chromosome (in this case carrying the wild type w” allele) 
or a Y chromosome. Thus, as revealed by the Punnett square (Table 4.8), only 
two genotypes will be present in the F; generation. All the male offspring will 
be white-eyed, while all the female offspring will be red-eyed: the white gene 
is therefore said to be sex-linked (more properly, X-linked, because it is 
carried on the X sex chromosome). 


Chapter 4 Inheritance 


A different result is found when this cross is set up with males drawn from a 
pure-breeding white-eyed stock and females from a pure-breeding red-eyed 
stock. 


= What do you expect would be the genotype and phenotype of the 
progeny from a cross between white-eyed males and pure-breeding red- 
eyed females? 


© As shown using a Punnett square (Table 4.9), the male progeny will be 
w’/Y and have red eyes, while the female progeny will be w’/w’ and 
also have red eyes. 


Table 4.9 A Punnett square showing sex-linked 
inheritance of Drosophila eye colour in a cross 
between a red-eyed female and white-eyed male. 


wi/Y 
red-eyed female | red-eyed male 


Sex linkage can occur in any species where the sexes are distinguished by 
heterosomes. There are several well-known examples of sex linkage in 
humans, including haemophilia (seen, for example, in the pedigree of 
European royal families, Section 4.7.2) and red-green colour vision 
deficiency. In both these cases, males possessing a single X-linked recessive 
mutation will be affected by the condition, but affected females are very rare. 


Summary of Sections 4.4 to 4.6 


¢ A genotype is the genetic make-up of an organism (often referring to a 
specific gene or genes). A phenotype is a characteristic, or trait, that 
depends on the genotype. 

¢ Using genetic variants of pea plants, Mendel showed that the patterns of 
inheritance are predictable and depend on discrete factors, now recognised 
as genes. 

¢ Variants of genes (alleles) can confer different phenotypes; some alleles 
may be dominant, others may be recessive. 

¢ Diploid organisms have homologous pairs of autosomes and therefore 
carry two copies of most of their genes. Homozygotes carry two copies of 
the same allele. Heterozygotes carry two different alleles, and the 
phenotype will reflect the dominant allele. 

¢ Genes located on separate chromosomes are inherited independently, while 
genes located on the same chromosome are not and are said to be linked. 

¢ Linked genes can be separated during meiosis by genetic recombination 
(crossing over). The frequency of recombination between two linked genes 
is related to the physical distance between them on the chromosome; this 
enables genetic maps to be constructed. 


133 


Generating Diversity 


134 


* Sex linkage is seen when genes are located on the sex chromosomes 
(heterosomes), and results in phenotypes correlated with gender. 


4.7 Human pedigree analysis 


So far, this chapter has considered patterns of inheritance using the 
experimental model organisms peas and fruit flies. Mendelian inheritance 
patterns hold true for all organisms that undergo sexual reproduction, and 
inheritance in humans is no different. This section will move on to consider 
how Mendelian genetics can be used to understand genetic disorders in 
humans. Of course, genetic research in humans can be rather complex, and 
must be conducted without the benefit of designing specific crosses! With the 
availability of a complete DNA sequence of the human genome, however, 
came a treasure trove of data about genetic variation in humans (Chapter 5). 
This has revolutionised genetic analysis in humans and, indeed, in a variety of 
other species for which conventional genetics is impossible or difficult. 


4.7.1 Determining dominance and recessivity from 
family pedigrees 


Pedigree analysis is frequently used to characterise the nature of inheritance of 
genetic variants or disorders affecting humans. This relies on unpicking the 
patterns of inheritance from family trees (or pedigrees) in which, for example, 
an inherited form of a particular disease or disorder has been identified. When 
reviewing a pedigree, one is generally making deductions about the 
inheritance of a characteristic (or trait) solely on the basis of phenotypes of 
individuals in several generations of a family. Pedigrees are drawn up using 
the symbols shown in Figure 4.14. 


DO "Or" 
mating -—_ affected 


male female 


Figure 4.14 Pedigrees are generally drawn using a set of standard symbols. Male 
individuals are represented by squares and females by circles, and a line joining 
two such symbols indicates mating. Individuals affected by a particular phenotype 
are indicated by black filled symbols, unaffected individuals by unfilled symbols. 


Consider the pedigree in Figure 4.15, which shows the inheritance pattern 
expected of a dominant allele. This pedigree represents the inheritance of an 
autosomal dominant mutation, such as that which causes the 
neurodegenerative condition called Huntington’s disease (Book 3, Chapter 3). 


Note that approximately half of the individuals in generation II are affected, 
while two out of three individuals in generation III are affected; in each case, 
only one of the parents of an affected individual is themselves affected. 


= What is the most obvious feature of this pedigree chart with reference to 
the parents of affected offspring? 


Chapter 4 Inheritance 


Figure 4.15 A typical pedigree showing dominant autosomal inheritance, as 
might be seen for Huntington’s disease, a progressive neurodegenerative condition. 
Generations of the family are indicated by the Roman numerals I, II and III, and 
individuals in each generation by Arabic numerals. 


© Each of the affected offspring has one affected parent. 


= This is a typical pattern seen for a dominant disorder. Can you 
suggest why? 


ca A person need only inherit one copy of a dominant disease-causing allele 
from their affected parent to develop the disease. 


m= Using the nomenclature Hd for the dominant, disease-causing mutant 
allele and hd for the recessive allele, what are the genotypes of 
individuals | and 2 in generation II? 


© Individual II-1 must be Ad/hd, as she is unaffected. Individual [1-2 must 
be Hd/hd, as he is affected, and must have inherited an hd allele from his 
mother (who is unaffected, and therefore hd/hd). Individual I-2 will have 
inherited the mutant Hd allele from his father, who is affected. 


In contrast, the pedigree shown in Figure 4.16 would be typical of a recessive 
character, such as albinism, which is characterised by the complete or partial 
absence of pigmentation in the skin, hair and eyes. There are several genetic 
causes of albinism: generally these are due to mutation of a gene required for 
the synthesis of the pigment melanin. Typically for a recessive character, 
individuals showing the albino phenotype do not appear in each generation of 
the pedigree. This can make recognising the heritability of recessive characters 
quite difficult! 


= Looking at Figure 4.16, what can you conclude about the genotypes of 
individuals IL-1, II-1 and II-2? Use the nomenclature C for the normal 
(i.e. wild type) allele and c for the albino allele. 


©  IflI-1 is albino: he must be homozygous for the recessive allele (c/c). 
Since this form of albinism is recessive, it follows that both his parents 
(II-1 and II-2) must be heterozygous (C/c). 


135 


Generating Diversity 


136 


Figure 4.16 A typical pedigree showing inheritance of a recessive character, such 
as albinism. 


The heterozygous individuals II-1 and II-2 are referred to as genetic carriers; 
they have inherited the recessive allele associated with albinism, but they do 
not display the phenotype. They are, however, able to pass the recessive allele 
to their offspring. 


4.7.2 Sex linkage in human pedigrees 


Examples of sex-linked genetic disorders in humans include the blood-clotting 
disorder haemophilia and Duchenne muscular dystrophy. Haemophilia is a rare 
condition characterised by a defect in blood clotting, which leads to 
uncontrolled bleeding following injury (among other symptoms). Blood 
coagulation occurs through a cascade of interactions between proteins known 
as clotting factors, and several distinct forms of haemophilia are known, each 
caused by mutation of a gene encoding one of the clotting factors. 


Perhaps the best-known example of haemophilia is seen in the royal houses of 
Europe descended from Queen Victoria. Queen Victoria was heterozygous for 
a mutant allele for haemophilia, and transmitted it to some of her sons and via 
some of her daughters to other generations (Figure 4.17). The haemophilia 
trait is clearly sex-linked, as every affected individual is male. Because the 
mothers of affected males are not themselves affected, the trait must be 
recessive. It is interesting to note that haemophilia was not seen in Queen 
Victoria’s family tree prior to her son, Prince Leopold, Duke of Albany, 
indicating that the haemophilia mutation responsible for the occurrence of 
haemophilia in her descendants may have originated with her. In fact, DNA 
analysis of the remains of the Romanov family (Figure 4.17) reveals that the 
form of haemophilia inherited from Queen Victoria was likely to have been 
haemophilia B (Table 4.10). 


Chapter 4 Inheritance 


Louis I 
1 George lil im} Duke of! 
Duke of Edward 
" ‘Saxe-Coburg Gotha Duke of Kent a 
(1767-1820) 
Victoria 
Mt et (1819-1901) 
Victoria Helena Leopold 
IV ilpsen’ (peo vil Princess bd Duke of G4 
Christian Albany Beatrice 
Vv Pee a } opera o Eugénie BiLeopais t 
ae wife of Alfonso Maurice 
Duke of | George May for 
MY ce isd Earl Prince aval Viscount yee [Gonzalo 
‘Trematon 
woe eeae Ee hae: 
P ‘Juan Sophie 
vies rae Carlos 
Vill pale 
Princess| Prince Anne Andrew Edward 
Diana | Charies 
Ix 
Wiliam — Henry 


Bhaemophilia © carrierfemale @ status unknown 


Figure 4.17 A pedigree chart of the royal families of Europe. The Romanov family are highlighted in pink and 
individuals with haemophilia are indicated by purple squares. A recessive allele causing haemophilia arose in the 
reproductive cells of Queen Victoria or one of her parents, through mutation. Carriers (heterozygotes) are also shown, 
There is uncertainty about the carrier status of some female individuals, shown by a question-mark (?). A number 
inside a circle or a square indicates the number of females or males, respectively; it is a device to save space. 


Table 4.10 Two forms of X-linked haemophilia. Each form affects a different 
component of the blood-clotting cascade. Queen Victoria’s family was affected by 


haemophilia B. 
Form Clotting factor absent Frequency of haemophilia in the 
male population 
haemophilia A factor VIII ~1 in 5000 
(~80% of all cases) 
haemophilia B factor IX ~1 in 25 000 


(~20% of all cases) 


= The frequency of phenotypes for recessive X-linked characters in a family 
or population is far higher in males than in females. Why should this be 
so? 
Males need inherit only one copy of the recessive allele for the 
phenotype to be manifest, whereas daughters would have to inherit two 
copies of the recessive allele (one from each parent). 


The parents of a female with haemophilia would be a father with haemophilia 
and a heterozygous carrier mother — a less likely circumstance, given the 
rarity of haemophilia B in the population. Females who are heterozygous 


137 


Generating Diversity 


138 


carriers of the disorder usually produce enough active clotting factor from 
their single gene allele to remain asymptomatic (i.e. they do not manifest 
disease symptoms). 


4.8 Non-nuclear inheritance 


In the first chapter of this book, the origin of life was briefly discussed, 
including theories about the origins of eukaryotes. One of the defining 
characteristics of eukaryotic cells is that many of the cellular functions are 
sequestered in subcellular organelles. For example, the genome of a eukaryotic 
cell is contained within the nucleus, where DNA replication and transcription 
occur (Chapter 2). Two other organelles with distinct functions and with their 
own DNA are mitochondria (the site of ATP synthesis) and chloroplasts (the 
site of photosynthesis). The similarity between the DNA of chloroplasts and 
mitochondria and the genomes of prokaryotes is evidence to support the 
endosymbiosis theory of the origins of eukaryotes, as described in 

Section 1.2.4. The structure of prokaryotic and eukaryotic genomes will be 
dealt with in more detail in Chapter 5, but for now it is sufficient to note that 
organelle and prokaryotic genomes are small circular DNA molecules. 


During the evolution of eukaryotes. organelle genomes have lost the great 
majority of their genes, and those genetic functions are now provided by the 
cell’s nuclear DNA. Nevertheless, some genetic functions do remain within 
the organelle, particularly those genes that relate to the organelles’ roles. For 
example, the mitochondrion retains genes encoding ribosome components 
(required for protein synthesis within the mitochondrion), and genes encoding 
a variety of proteins involved in oxidative phosphorylation, the principal 
biochemical pathway for ATP production (Book 2, Chapter 3). 


4.8.1 Mitochondrial mutation 


Considering the importance of mitochondria in energy metabolism, it is not 
surprising that mutations in mitochondrial genes can have a profound effect on 
the whole organism. However, the inheritance of such mutations does not 
follow normal Mendelian rules. Consider the pedigree in Figure 4.18, which 
shows the pattern of inheritance of a neurodegenerative condition resulting 
from mutation of a mitochondrial gene. 


What can be deduced about the pattern of inheritance in this pedigree? Firstly, 
the condition does seem to be passed from generation to generation, implying 
a genetic cause. However, it is unlikely to be dominant, as no offspring of 
affected males in any of the three generations are themselves affected. 
Similarly, it is unlikely to be recessive. 


= Why is it unlikely to be recessive? 


© If you look at the cases where an affected parent (who would have to be 
homozygous if the trait is recessive) has affected offspring, the unaffected 
parent would need to be heterozygous. In that circumstance, half of the 
children would be affected. In fact, in the cases where affected children 
occur, all the children are affected. 


Chapter 4 Inheritance 


Figure 4.18 The pattern of inheritance of a neurodegenerative condition caused by mutation of a mitochondrial 
gene. This condition is clearly heritable, but is maternally inherited. 


The clue to how this condition is inherited lies in the observation that in all 
cases of inheritance, the trait is exclusively inherited from the mother, and all 
offspring are affected. The explanation of this pattern of heritability, maternal 
inheritance, is that the mutant gene is within the mitochondrial genome. 
During fertilisation, no mitochondria are transferred from the sperm: 
mitochondria are therefore only inherited from the female parent. Thus, where 
the mitochondrial DNA of a mother carries a mutation, this is inherited by all 
her offspring. 


Activity 4.2 Human pedigree analysis B 
(LO 4.3) Allow one hour = 
In this activity, you will learn how to interpret human pedigrees, and, by 
inferring genotypes within these pedigrees, determine whether they reflect 
dominant, recessive or sex-linked inheritance. 


Genetic inheritance can be difficult to study in humans, as not only is their 
life cycle very long, but genetic studies must rely on information from family 
pedigrees rather than matings deliberately set up by a geneticist. The activity 
leads you through the application of what you have learnt about inheritance in 
this chapter to understanding the genetics of inheritance in humans, and how 
you can deduce the patterns of inheritance revealed by the phenotypes within 
a family tree. 


4.9 Discontinuous versus continuous variation 


The simple examples of Mendelian inheritance discussed throughout this 
chapter all concern discontinuous variation, which results in two or more 


139 


Generating Diversity 


140 


discrete classes of phenotype with no intermediates — for example, red or 
white eye colour in fruit flies, round or wrinkled peas, normal blood clotting 
or haemophilia. Discontinuous variation usually occurs when a trait is 
controlled by a single gene. Some traits, however, show continuous 
variation, in which the phenotype does not fall into distinct categories, but 
varies over a broad range, with many intermediate values. 


= Can you suggest a human characteristic that displays continuous 
variation? 


© Some examples you may have thought of are height, weight, skin colour, 
intelligence. 


Continuous variation usually results from the combined effect of many genes 
(polygenic inheritance) and is often also significantly influenced by 
environmental effects: for example, the level of nutrition affects height 

and weight. 


Diseases that are inherited as a result of variation in a single gene (monogenic 
inheritance), like those described in Section 4.7, are quite rare in the human 
population. Most of the common human diseases are regarded as 
multifactorial, and are influenced by numerous interacting genes and 
environmental factors. A high risk of coronary heart disease is inherited in 
some families, but can also occur in isolated individuals, and the degree of 
risk is affected by many conditions and characteristics including obesity, 
diabetes, high blood pressure, gender and ethnic group. A great deal of 
research activity is being directed towards analysing the complex genetic 
contribution to disease risk, and you will learn more about the genome 
mapping techniques used in such studies in Chapter 5. 


Summary of Sections 4.7 to 4.9 


« Human pedigree analysis involves deducing genotypes, dominance and 
recessivity by interpreting the pattern of phenotypes in a family tree. 

¢ Inheritance of mitochondrial genetic variation follows a maternal 
inheritance pattern. 

« Mendelian inheritance describes the pattern of inheritance of individual 
allelic variants that lead to distinct phenotypes (discontinuous variation). 

* Many phenotypes are continuous and result from the combined effect of 
variants in many genes and also environmental influences. 


4.10 Final word 


In this chapter, you have studied examples of the Mendelian inheritance of 
individual nuclear genes in multicellular eukaryotes, and the important 
contribution made to variation by the mechanisms of sexual reproduction: the 
independent assortment and recombination of chromosomes during meiosis to 
produce haploid gametes, and the fusion of gametes at fertilisation to produce 
a zygote with a new combinations of the parental chromosomes. 


Chapter 4 Inheritance 


Much of the variation in multicellular organisms results from the complex 
interaction between multiple genes and environmental factors. The intensive 
study of the molecular basis of inheritance in recent decades has begun to 
unravel much of this complexity. Historically, the adoption of prokaryotic 
organisms for genetic research, particularly the bacterium £. coli and the 
bacteriophage (bacterial viruses) that infect it, was instrumental in the 
development of molecular biology and molecular genetics. The simpler 
genomes found in prokaryotes led the way in understanding the structure of 
genes, and how their expression can be regulated in response to the cell’s 
environment. The following chapter describes in greater detail the structure 
and function of prokaryotic and eukaryotic genomes, and Chapter 6 looks at 
the control of gene expression. 


4.11 Learning outcomes 


4.1 Describe the mechanisms of cell division (mitosis and meiosis) by which 
the genetic material is segregated into new cells and explain the significance 
of meiosis and sexual reproduction for the inheritance of variation. 


4.2 Explain how the inheritance of dominant and recessive, linked and 
unlinked alleles determines the frequencies of different genotypes and 
phenotypes in a given genetic cross. 


4.3 Interpret simple human pedigrees and identify the mechanisms 
of inheritance. 


141 


trdyttee 2d) emrdd uae nates linn 08 Td 

setsccus MOT cafhal [pierced fle ig ohne 
x 4 ag eal whew: oe Ww ee een 
ye pk le greenery eee p acter 
nce OA Gomory 


Chapter 5 Genes and genomes 


Chapter 5 Genes and genomes 


5.1 Introduction 


Chapter 4 introduced you to the patterns of inheritance of genetic information, 
and to the concept of the gene as the unit of inheritance. In this chapter you 
will learn more about the molecular basis of inheritance. 


The genetic material of all organisms is deoxyribonucleic acid (DNA), a 
molecule with characteristics that are particularly suited to its role in 
propagating heritable characteristics from one generation to the next (only 
certain viruses use ribonucleic acid (RNA) as their genetic material, 

Section 5.9). The gene, therefore, as well as being the heritable unit, can also 
be defined as a region of DNA that specifies the amino acid sequence of a 
polypeptide, or in some cases a functional RNA molecule (such as a 
ribosomal or transfer RNA). Genes are usually associated with a more 
extensive region of DNA containing regulatory sequences that determine when 
the gene product is expressed, and the level of its expression. Historically, the 
prevailing view was that the great majority of genes encoded proteins, but as 
molecular biologists have taken an ever-closer look into the structure and 
function of genes, it has become clear that this picture is overly simplistic. In 
addition to those genes that encode proteins, ribosomal RNAs and transfer 
RNAs, there appear to be many genes that encode other essential RNAs; for 
example, in Chapter 6 you will learn more about microRNAs, which play a 
pivotal role in the regulation of gene expression. The entirety of an organism’s 
hereditary information, known as its genome, includes all of these genes and 
regulatory sequences and, particularly in eukaryotes, an often large amount of 
DNA with poorly understood or no apparent function. 


You should now complete Activity 5.1 (Part 1) which introduces the 
fundamental characteristics of DNA that relate to its role as the genetic 
material of the cell. In the rest of this chapter, you will discover how gene and 
genome structure are reflected in gene activity in prokaryotes and eukaryotes. 
You will also look at some of the technologies that are being used to study 
and compare genome sequences, and how these will shape the future direction 
of genetic research, particularly with respect to human health and disease. 


(LOs 5.1 to 5.3) Allow 20 minutes 

This activity explores the structure and characteristics of DNA, and how the 
genomic DNA is copied or replicated in order to be passed on to daughter 
cells. You may already be familiar with this process from earlier studies, but 
are recommended to review this material before proceeding with this chapter. 
Part 2 of this activity later in the chapter looks at how the information in 
DNA specifies the amino acids in an encoded protein. 


143 


Generating Diversity 


A crystal is a solid formed by a 
repeating, three-dimensional 
array of atoms, molecules, or 
ions. 


Although B-DNA is the most 
common form, X-ray diffraction 
studies have shown that DNA 
can occasionally adopt other 
helical forms under certain 
conditions, for example Z-DNA, 
which is a left-handed helix. 


144 


5.2 DNA is the genetic material 


DNA was definitively identified as the genetic material by Avery, MacLeod 
and McCarty’s work in 1944, which demonstrated that DNA purified from a 
virulent (i.e. disease-causing) strain of the bacterium Streptococcus 
pneumoniae was the ‘transforming principle’ which, when introduced into a 
harmless non-virulent strain of the bacterium, transformed it into a virulent 
form. The role of DNA as the hereditary material was therefore already well- 
accepted by the time the double-helical structure of DNA was proposed by 
Crick and Watson in 1953. Since then, molecular biology has begun to reveal 
the enormous complexity and variety of gene structure, and the mechanisms 
that regulate gene expression; the synthesis of functional polypeptides and 
RNAs. 


In purely chemical terms, natural DNA molecules are the largest biomolecules 
known: an estimate of the total length of the DNA molecules that comprise 
one copy of the human haploid genome (approximately 3 x 10° or 

3000 million base pairs) is of the order of 1.8 metres, assuming the DNA 
molecule in each chromosome is laid end-to-end. How can such long 
molecules be compacted and packaged into a cell, and how can they be 
accurately and rapidly copied during each cell division? 


5.2.1 Features of DNA structure that impact on its role 


The molecular structure of DNA was first determined by Francis Crick and 
James Watson, building largely on the data of Rosalind Franklin, Franklin 
used the technique of X-ray diffraction, in which DNA crystals were irradiated 
with X-rays to obtain photographic images showing how the X-rays are 
scattered by collision with the individual atoms in DNA. (You will encounter 
X-ray diffraction again in Book 2, Chapter 1, in relation to the investigation 
of protein structure.) Watson and Crick’s insight was that the regular spots and 
blobs in those photographic images revealed that DNA has a simple (in 
macromolecular terms) right-handed, double-helical structure with about 10 
base pairs per helical turn. This has since come to be known as the Watson— 
Crick structure, also known as B-DNA. Figure 5.1a shows the Watson—Crick 
structure of a short section of DNA (you can observe the structure of the 
DNA more clearly using the interactive model of DNA in Activity 5.1, 

Part 1). The arrangement of the sugar—phosphate backbone of each strand, and 
the base pairs which connect the two backbones in a ladder-like arrangement, 
can be seen schematically in Figure 5.1b. 


If you look closely at the helical structure in Figure 5.la and b, you can see 
that the twisting of the ladder-like double-stranded DNA creates two 
unequally sized grooves running along the molecule between the two sugar— 
phosphate backbones. These are referred to as the major and minor grooves. 
Proteins that have a role in recognising and binding to specific DNA 
sequences do so by reaching their target nucleotides through the major or 
minor grooves. 


sugar—phosphate 
backbone 


base 


hydrogen bonds 


minor 


groove 


(a) (b) 


gure 5.1 The DNA double helix. (a) A molecular model showing the double- 
helical structure of DNA. Two sugar—phosphate backbones intertwine and are 
linked by non-covalent hydrogen bonding (represented by white “dashed’ bonds) 
between the bases on each strand. (b) A schematic view of the two strands and the 
hydrogen bonding between base pairs. The ‘major’ and ‘minor’ grooves of the 
double helix, i.e. the different-sized spaces between the two backbones, are 
indicated in both (a) and (b). 


Figure 5.2 shows schematic views of the molecular structure of DNA. The 
two DNA strands are polymers of units called nucleotides (Figure 5.2a). Each 
nucleotide in DNA consists of a ring-shaped 5-carbon deoxyribose sugar with 
a base attached to the first (one-prime, 1’) carbon and a phosphate group (a 
phosphorus atom bonded to four oxygen atoms by three single bonds and one 
double bond) attached to the fifth (five-prime, 5’) carbon (Figure 5.2a). The 
backbone of each strand is formed by covalent bonding between the 
deoxyribose sugar (represented by pentagons in Figure 5.2b) of one nucleotide 
and the phosphate group (represented by circles in Figure 5.2b) of the next 
nucleotide. The phosphate groups in the backbone carry negatively charged 
oxygens (Figure 5.2a), so DNA molecules have an overall negative charg 
The two DNA strands are held together via non-covalent hydrogen bonds 
(shown in purple in Figure 5.2b) between the bases. These bonds are weaker 
than covalent bonds, making it relatively easy for short stretches of the two 
strands of DNA to be separated from each other. 


ce. 


Chapter 5 Genes and genomes 


145 


Generating Diversity 


O phosphate 


sugar 
base 


0. H 
ne ha > “thymine 


| 
adenine no NS B 
, sw i 
H—- Ge | (9) 


H 


~ 


N-CS ae 


e 


7 H 
on yas cytosine 
ie) 
“3 Il a 
guanine LH C OQ) 
H— 70 


@ 4 


(c) 


Figure 5.2 The molecular structure of DNA. (a) The chemical structure of a nucleotide unit. Note that the base (in 
this case adenine) is attached to the 1’ carbon of the deoxyribose sugar, the phosphate group is linked to the 

5’ carbon, and the 3’ carbon carries a hydroxyl group (-OH). To the right of the chemical structure is a schematic 
representation of the nucleotide unit. (b) The two strands of DNA are polymers of nucleotides oriented in opposite 
directions (antiparallel). (c) The DNA base pairs: the bonding between adenine and thymine involves two hydrogen 
bonds; the bonding between guanine and cytosine involves three hydrogen bonds. 


= When do DNA strands need to be separated? 


© DNA strands need to be separated to act as templates for DNA 
replication. 


The base pairing is specific: adenine (A) always pairs with thymine (T), and 
guanine (G) always pairs with cytosine (C) (Figure 5.2c), Thymine and 
cytosine have a single ring of carbon and nitrogen atoms in their structure and 
are referred to as pyrimidines, while adenine and guanine have two rings and 
are referred to as purines. Each of the base pair combinations involves a 
purine (A or G) and the pyrimidine with which it pairs (T and C, 


146 


respectively). It follows from the specificity of base pairing that the sequence 
of one strand can always be inferred from the sequence of the other: the two 
strands are said to be complementary. For convenience, DNA sequences are 
usually represented as sequences of letters representing the four bases, so the 
sequence of DNA in Figure 5.2b can be represented as: 


3' ATACAGTC S’ 
5’ TATGTCAG 3’ 


= In Figure 5.2, what feature of the A:T and G:C base pairing ensures that 
the sugar—-phosphate backbones are held at a constant distance from each 
other? 


© The overall length of an A:T and a G:C base pair is approximately equal 
(Figure 5.2c), so the distance between the two sugar—phosphate 
backbones remains constant all along the DNA helix. 


The discovery of the double-helical structure of DNA immediately suggested a 
mechanism both for the accurate replication of DNA and the inheritance of 
genetic information. The specificity of base pairing means that each DNA 
strand can act as a template for the synthesis of a second strand during DNA 
replication. Watson and Crick refer to this crucial observation at the end of 
their 1953 paper: 


...it has not escaped our notice that the specific pairing we have postulated 
immediately suggests a possible copying mechanism for the genetic material 
and is key to the function of DNA as a repository for genetic information. 


Note from Figure 5.2b that the two DNA strands are oriented in opposite 
directions. In the left-hand strand, the oxygen (O) atoms in the deoxyribose 
sugar rings are oriented upwards, while in the other strand they are oriented 
downwards, Each strand of DNA has at one end a ‘free’ unattached phosphate 
group attached to the 5’ carbon of the terminal deoxyribose sugar ring and at 
the other a ‘free’ hydroxyl group attached to the 3’ carbon of the terminal 
deoxyribose sugar ring. This directionality is commonly referred to using the 
terminology ‘3’ end’ and ‘5’ end’, respectively, as indicated in the figure. This 
opposing arrangement of the two DNA strands is described as antiparallel. 


During DNA replication, both DNA strands are used as templates for new 
strand synthesis (Figure 5.3a). The new DNA is synthesised from individual 
nucleotide precursors referred to as dNTPs (deoxyribonucleotide 
triphosphates), specifically dATP, dCTP, dGTP and dTTP, which have three 
linked phosphate groups (Figure 5.3b). A large amount of chemical energy is 
stored in the bonds attaching the two terminal phosphates (these molecules are 
similar to ATP, described in Section 1.2.1), As each dNTP is incorporated into 
the new DNA strand, the high energy bonds linking two of the phosphates are 
broken and the energy released is used to create a covalent bond known as a 
phosphodiester bond between the 5’ phosphate of the nucleotide and the free 
3’ hydroxyl group of the existing DNA chain (Figure 5.3c and d). Hence DNA 
synthesis can only ever proceed in a 5’ to 3’ direction and this has important 
consequences for the mechanisms of synthesis, as you shall see in the next 
section. 


Chapter 5 Genes and genomes 


147 


Generating Diversity 


5! 
T GUN TRANG CA 
1m nt ih mF 
SO Ar crnch a 
3 
new strands 
5 
T CA © ALC IANS 
Hi i won Mm wh MX 
AGATA TG ARS 
3" 
(a) 


two terminal 
phosphate groups 
are cleaved as 
nucleotide attaches 
to chain 


(c) 


ny 
yl 


3 NH, 


OVO? 550: ¢ J 
| s N “NN 


4 Il rh adenine 
phosphates 3} 2" 
(b) OH H 
deoxyribose sugar 
| 
oO 
I s 
O—P—O-CH) 4 base 
i DNA chain 
x 
phosphodiester OH o 
bond) (9=P—O0—CH, ase 
I ou 
nucleotide 


(d) OH H 


Figure 5.3. DNA replication. (a) Representation of the DNA duplex during replication showing that each of the new 
strands is synthesised in a 5’ to 3' direction. Hence the two new strands are synthesised in opposite directions as 
shown by the arrows. (b) Structure of a dNTP (in this case dATP) shown in an abbreviated form without ring 
carbons. (c) Incorporation of a nucleotide into the DNA strand requires base pairing specificity. (d) A covalent 
phosphodiester bond is formed by reaction between the phosphate group of the dNTP and the 3’ —-OH of the 


preceding nucleotide in the chain. 


5.3. DNA replication 


The entire genome of a cell must be replicated prior to cell division 

(Activity 4.1). This is a formidable task, even for those cells with a relatively 
small genome. The genome of the bacterium Escherichia coli, a circular DNA 
molecule of around 4.6 x 10° base pairs (bp), is copied at about 1000 base 
pairs per second. Replication begins at a single point, a particular sequence of 
bases known as the origin of replication and is completed in approximately 
40 minutes. In contrast, the haploid human genome is much larger, about 


3 x 10° bp, and its replication proceeds at a much slower rate; about 50 base 
pairs per second. 


= Calculate the approximate time it would take to copy just one human 
chromosome containing 150 = 10° base pairs at this rate of replication. 


© It would take 3 x 10° seconds, about 800 hours. 


In fact, it only takes about 8 hours for a cell to copy the entire human 
genome. This is possible because replication is initiated not just at one, but at 
multiple origins of replication; it is estimated that there are around 20 000 
replication origins in the human genome. Furthermore, DNA replication is 
also highly accurate; it is estimated that approximately one error is made for 
every 10° base pairs synthesised. How is this remarkable speed and fidelity 
achieved? 


A number of molecules and enzymes are needed for DNA replication; these 
form a large protein complex, known as the replication complex, which 
recognises and binds to origins of replication. The replication complex 
includes DNA polymerases, the enzymes that actually assemble the 
nucleotides into the new DNA strands. However, before DNA polymerase can 
act, the two DNA strands must be separated to allow the polymerase access to 
the individual template strands. This is achieved by enzymes known as 
helicases which separate the base pairs and hence unwind the two DNA 
strands (Figure 5.4a). The exposed stretches of single-stranded DNA are 
immediately protected by single-strand DNA binding (SSB) proteins which 
prevent them from reannealing to reform the duplex. 


The separation of the strands creates a so-called replication bubble with a 
replication fork at each end (Figure 5.4b). Helicases continue to unwind the 
DNA at each of these forks, extending the replication bubble in both 
directions. As the duplex is separated, the DNA helix ahead of the fork 
becomes overwound, creating torsional stress in the helix ahead of the fork 
(think of two strings wound around each other: if you try to pull them apart 
from one end, the twists will tighten along the rest of the length). If 
uncorrected, this stress would eventually become too strong for the helicase to 
overcome, and strand separation would cease. How this problem, often known 
as the ‘topological problem’, is overcome was a primary concern in the years 
following the publication of the Watson—Crick model for the helical structure 
of DNA. The mystery was solved by the discovery of the enzyme DNA 
topoisomerase, which ‘nicks’ one of the DNA strands; that is, it cuts the 
sugar—phosphate backbone, allowing the DNA helix to swivel around itself, 
and then topoisomerase reseals it. 


DNA polymerases follow along as the DNA is unwound at the replication 
fork and synthesise a new DNA strand using each of the separated strands as 
templates. Free nucleotides are sequentially incorporated using the A:T and G: 
C base pairing specificity to ensure the appropriate nucleotide is incorporated 
in the order determined by the template strand. DNA polymerase catalyses the 
polymerisation of DNA by forming the covalent phosphodiester bond between 
a free nucleotide and the 3’ end of the new DNA strand (Figure 5.3c and d). 
DNA polymerases make a base pairing error approximately once in every 


Chapter 5 Genes and genomes 


149 


Generating Diversity 


lagging strand 
(Okazaki fragments) 


single-strand 
binding protein 


(a) 


lagging strand 
(Okazaki fragments) 


(Okazaki 
fragments) <— replication forks —> 
(b) move apart 


Figure 5.4 (a) The roles of proteins involved in DNA replication. DNA helicase separates and unwinds the two 
strands so that they can act as templates for DNA synthesis. DNA topoisomerase nicks the DNA to prevent 
supercoiling ahead of the replication fork. Single-stranded DNA at the replication fork is prevented from reannealing 
and protected by single-strand DNA binding proteins. One DNA polymerase molecule in the complex performs 
continuous leading-strand synthesis on one of the unwound DNA template strands, while a second DNA polymerase 
molecule synthesises the lagging strand on the other template strand. The lagging strand is synthesised as a series of 
short Okazaki fragments, each initiated by an RNA primer generated by the enzyme primase. Another DNA 
polymerase molecule and a ligase (not shown) follow along removing RNA primers from the lagging strand, filling in 
the gaps with DNA and ligating the fragments together to form a complete DNA strand. (b) Schematic representation 
of a replication bubble (a region where replication complexes have separated the two DNA strands). The replication 
bubble has an active replication fork at each end (the box shows how the right-hand fork corresponds to (a) above). 
The replication complexes associated with the two replication forks move in opposite directions, extending the 
replication bubble and performing leading- and lagging-strand synthesis as the template DNA is unwound. 


10° bp synthesised; however, there is a built-in ‘proof-reading’ mechanism. 
The DNA polymerases can also act as a nuclease, that is, they can also break 
the links between nucleotides. Each time a new nucleotide is added, the 


150 


polymerase ‘checks’ that the base pair that it has just formed is correct, and if 
not, the polymerase reverses and its nuclease activity removes the incorrect 
nucleotide before synthesis is resumed. Overall therefore, DNA replication is 
extremely accurate with error rates typically around one error per 10” bases. A 
final check on the accuracy of the newly formed DNA is performed by repair 
enzymes (Section 5.4.1). Ultimately, the combination of proof-reading and 
repair activities will correct the great majority of errors. 


The synthesis of DNA by DNA polymerase has two important limitations. 
Firstly, the polymerase can only catalyse the addition of nucleotides to a pre- 
existing nucleic acid strand. Secondly, it can only catalyse the progressive 
addition of nucleotides in one direction: nucleotides may only be added to the 
3’ end of a DNA strand (Figure 5.3c). This raises two questions. The first 
question is: how is the synthesis of a new strand started, or put another way, 
what provides the DNA polymerase with a free 3’ -OH end to which it can 
add the first nucleotide? The second question is: how can the two antiparallel 
DNA strands both be copied simultaneously, if synthesis can only proceed in 
one direction (5' to 3’)? 


The answer to the first of these questions is that a new DNA strand is in fact 
initiated from a short RNA molecule, called an RNA primer, which binds or 
anneals by complementary base pairing to the single-stranded template DNA 

exposed as the DNA strands are separated by DNA helicase (Figure 5.4). The 
RNA primer molecules are synthesised by another enzyme called a primase. 

Annealing of a primer to the DNA template strand thus provides the starting 

point (a free 3’ -OH) to which the polymerase can start adding nucleotides. 


The second question relates to how replication can proceed on both of the 
DNA template strands simultaneously. Consider Figure 5.4b, which shows the 
two growing replication forks, which are moving in opposite directions from 
the replication origin, creating an expanding replication bubble as the double 
helix is unwound. You can see that each strand of the parental double helix 
acts as a template on which a new strand is synthesised. Each daughter DNA 
molecule therefore consists of one newly synthesised strand (pink in the 
figure) and one parental strand (purple). Remember that DNA can only be 
synthesised in a 5’ to 3’ direction (thus ‘reading’ the template DNA strand in 
the 3’ to 5’ direction). One of the new DNA strands is synthesised as one long 
continuous molecule (starting at the replication origin and growing in the 
direction the replication fork is moving) and is known as the leading strand. 
However, in order to synthesise the other new strand in the 5’ to 3’ direction, 
the polymerase molecule synthesising this strand must move along the 
template DNA in the opposite direction to the movement of the replication 
fork. The synthesis of this strand therefore occurs in short sections and it is 
known as the lagging strand. Each of the short sections of the lagging strand 
(known as Okazaki fragments after Reiji Okazaki who discovered them 

in 1966) is initiated by annealing of an RNA primer to the template DNA 
strand. As the replication complex proceeds along the lagging strand, the 
RNA primers are removed, the gaps they leave are filled in with DNA and 
then the short DNA fragments are covalently linked together by another 
enzyme called DNA ligase, thereby assembling the separate fragments into a 
continuous DNA strand. In prokaryotes, which usually have a single circular 


Chapter 5 Genes and genomes 


151 


Generating Diversity 


DNA molecule (Section 5.7.3), the diverging replication forks meet up on the 
other side of the circular DNA, yielding two separate copies of the double- 
stranded DNA (each containing one template strand and one newly 
synthesised strand). In eukaryotes, the replication forks originating from 
multiple origins meet up with each other, ultimately resulting in two separate 


copies of each linear chromosomal DNA molecule 


The animation at the end of Activity 5.1 (Part 1) showed a dynamic view of 
the replication complex in action, based on structural studies of the proteins in 
the complex. In this case, the parental DNA helix can be seen entering the 
replication complex from the left (this image is in the opposite orientation to 
Figure 5.4a), the leading strand is exiting downwards, and you can see the 
template for the lagging strand being spooled out in a big loop as Okazaki 
fragments are synthesised (Figure 5.5) 


lagging strand 
(Okazaki fragment) 


_—— DNA polymerase 


parental DNA 
helix 


helicase 


DNA polymerase 


leading strand 


Figure 
strands form a complicated looping structure around the site of synthesis. 


A model for the DNA replication complex, showing how the DNA 


The ability of isolated DNA polymerases to carry out DNA synthesis on a 
purified DNA template in a test tube is widely exploited in molecular biology, 
most notably in the polymerase chain reaction, one of the most influential 
molecular biology techniques developed in recent decades (Box 5.1), and in 
DNA sequencing (Box 5.3) 


Box 5.1 The polymerase chain reaction 


The polymerase chain reaction (PCR) is a technique that permits the 
rapid amplification of a DNA fragment from a vanishingly small amount 
of starting material isolated from cells. It yields sufficient quantities 


152 


(micrograms) of accurate DNA copies for techniques like DNA 
sequencing and gene cloning (described later in this chapter) and is a 
particularly powerful and versatile technique — indeed, its main inventor 
Kary Mullis earned a Nobel Prize. 


PCR makes use of the naturally occurring enzyme activities that replicate 
the double-stranded DNA in cells. To synthesise DNA in vitro, a number 
of biomolecules are required: 


¢ a double-stranded template DNA 


« DNA polymerase (the enzyme that catalyses the addition of 
nucleotides to a new DNA strand) 


¢ the four dNTPs (dATP, dGTP, dCTP and dTTP) 


¢ two oligonucleotides (short lengths of single-stranded DNA, generally 
around 20 nucleotides long) to act as primers — one for each DNA 
template strand. 


® Why does DNA synthesis require oligonucleotides to act as primers? 


Recall that the DNA polymerase cannot initiate the synthesis of a 
DNA strand. Instead, polymerisation begins by addition of 
nucleotides to an existing short nucleic acid primer annealed to the 
template strand. 


There are two key aspects of PCR that make it so useful. Firstly, the 
sequences of the two oligonucleotide primers determine exactly which 
section of a DNA molecule will be copied by the polymerase. The 
specificity of complementary base pairing between a primer and its target 
DNA means that a particular sequence may be specifically targeted and 
amplified from a much larger mass of other DNA sequences. Secondly, 
repeated rounds of DNA synthesis, in which each newly synthesised 
section of DNA itself becomes a template, lead to an exponential 
increase in the quantity of the specific DNA product. 


Each round of DNA synthesis has three phases: 


1 denaturation: separation of the strands of DNA to yield two single- 
stranded DNA templates 


2 primer annealing: base pairing of the oligonucleotide primers to their 
complementary sequences 

3 elongation: synthesis of two new DNA strands by DNA polymerase, 
initiated from the oligonucleotide primers. 


The method is remarkably simple. However, it is first necessary to know 
part of the DNA sequence of the region to be amplified in order to 
chemically prepare the short synthetic DNA primers for initiating the 
reaction. The two primers are complementary to the ends of the region 
that is to be amplified, and are usually about 20 nucleotides long. The 
DNA polymerase used for PCR is a thermostable (heat-resistant) enzyme, 
derived from a bacterium adapted to living in high temperature 
environments: for example, the commonly used Jag DNA polymerase is 


Chapter 5 Genes and genomes 


153 


Generating Diversity 


154 


derived from the bacterium Thermus aquaticus which inhabits hot 
springs. 

A tube containing the reaction mixture of target DNA, primers, 
nucleotides and Tag polymerase is first placed at 94 °C, which denatures 
the template DNA (disrupting the weak hydrogen bonds between base 
pairs) and separates it into the two single strands. The reaction is then 
transferred to a lower temperature (usually around 55-58 °C; the exact 
temperature depends on the sequence of the primers) to allow the primers 
to anneal (i.e. bind) to the template DNA strands via complementary 
base pairing. Finally, the reaction is incubated at 72 °C for 
polymerisation (elongation) to occur (Tag polymerase functions at an 
optimal temperature of 72 °C). This represents one cycle of denaturation, 
primer annealing and elongation (Figure 5.6a). The thermostable Taq 
polymerase is resistant to denaturation at 94 °C (a temperature that 
irreversibly damages most enzymes), so the reaction mix can be 
subjected to 30 or 40 of these cycles without the addition of fresh 
enzyme. Repeating the cycle 30-40 times causes amplification of the 
short target sequence (Figure 5.6b). As long as the dNTPs and primers 
are present to excess, the amount of target amplified grows exponentially. 
By the later cycles, the primers and free dNTPs begin to be depleted as 
they are consumed by synthesis, and the amplification rate slows. 


Summary of Sections 5.1 to 5.3 


All organisms use DNA as their genetic material. The double-stranded 
structure of the DNA double helix, with its complementary base pairing 
between the bases of the two strands, is key to the stability of DNA, the 
mechanism for efficient and accurate DNA replication (copying) and its 
ability to encode genetic information. 


DNA replication involves a large protein complex (the replication 
complex) comprising a number of proteins including: DNA helicase, which 
unwinds and separates the two DNA strands; primase, which provides 
RNA primers that anneal to the DNA template strands and initiate new- 
strand synthesis; and DNA polymerase, which links nucleotides to the new 
DNA strands. 

DNA synthesis is unidirectional, 5’ to 3’ (so the template is always ‘read’ 
3' to 5’), which has implications for replication. While the leading strand 
can be synthesised as one continuous DNA molecule, the lagging strand is 
synthesised in short sections, which are finally joined into a complete 
strand through the action of DNA ligase. 

DNA polymerases have a proof-reading function which monitors the newly 
synthesised DNA strand for misincorporation of nucleotides. 

Thermostable forms of DNA polymerase can be used in vitro to carry out 
the polymerase chain reaction (PCR), a method of amplifying a specific 
section of DNA using synthetic primers to initiate DNA synthesis. 


Chapter 5 Genes and genomes 


template DNA 
5! 3" 


3 5" 


] denaturation template DNA 


ce al P | 


templates for 30-40 cycles 


products become 
J further rounds (b) 


(a) 


Figure 5.6 The polymerase chain reaction. (a) Each cycle consists of three phases: denaturation, primer annealing 
and elongation. During denaturation, the DNA is heated to approximately 94 °C to separate the strands. In the 
annealing phase, the temperature is reduced, permitting the short DNA primers to anneal to complementary sequences 
in the template strand. During elongation, the temperature is elevated to 72 °C, the optimum temperature for 
thermostable DNA polymerase, and DNA synthesis occurs. (b) After the first round, the product itself becomes a 
template so that after several cycles of PCR, there are many copies of a short double-stranded section of DNA. 
Typically, 30-40 cycles of amplification will be carried out, leading to an exponential increase in the mass of the 
target sequence. 


5.4 DNA repair 


Any unrepaired errors that occur during DNA replication or as a result of 
damage to DNA will give rise to mutations, permanent changes in the 
genomic DNA sequence of the next generation of cells. If an error is not 
repaired, it will be copied when the cell divides and the DNA of subsequent 
cells will contain the change. Clearly, when this happens, the information 
carried by the DNA is also changed and in some cases this may have 
profound effects on expression of gene products (Section 5.6). However, all 
organisms have a number of DNA repair mechanisms. 


155 


Generating Diversity 


Reactive oxygen species (ROS) 
are highly chemically reactive 
molecules containing oxygen 
atoms with single unpaired 
electrons. ROS are generated by 
natural metabolic processes 
within cells, and are constantly 
removed to prevent them from 
damaging other molecules. 


156 


5.4.1 Correction of errors in replication 


While the innate proof-reading capacity of DNA polymerase ensures a very 
high degree of accuracy during DNA replication, occasionally incorrect 
nucleotide incorporation is overlooked. Cells are able to correct nucleotide 
mismatches that fail to conform to normal base pairing rules by a process 
known as mismatch repair (MMR). 


Mismatch repair enzymes can recognise the newly synthesised strand and 
ensure that, where there are mismatches, the misincorporated nucleotide in the 
new strand (not the template strand) is the one replaced. In some bacteria, 
DNA becomes methylated some time after replication (methyl groups are 
added to some of the nucleotides) and the MMR enzymes appear to recognise 
which is the newly synthesised DNA strand by its lack of methylation just 
after replication. The recognition mechanism in eukaryotes is less clear, but 
may involve detection of temporary single-stranded breaks, known as ‘nicks’, 
in the newly synthesised DNA strand. Recall that A:T and G:C base pairs are 
of equal length and allow the sugar-phosphate backbones of the DNA duplex 
to be maintained a certain distance apart. 


= How might repair enzymes detect a base pairing mismatch, such as A:C? 


© The mismatched base pair is an inappropriate length and will cause a 
distortion in the DNA helix. 


A complex of MMR enzymes nicks the new strand close to the distortion, 
unwinds a short stretch of the helix, then an exonuclease (an enzyme that 
removes nucleotides one at a time from the end of a polynucleotide chain), 
starting at the nick, degrades a section of the new strand, including the 
mismatch. A DNA polymerase then fills in the gap left in the strand by 
synthesising new DNA in the 5’ to 3’ direction, using the other, intact DNA 
strand as template. The enzyme DNA ligase completes the repair by joining 
the sugar—phosphate backbone of the newly synthesised section of DNA to the 
end of the strand, so that there are no remaining gaps (you encountered DNA 
ligation earlier in this chapter, in connection with joining of the short, single- 
stranded Okazaki fragments during lagging-strand DNA synthesis). 


5.4.2 Repair of damaged DNA 


Although most mutations are believed to be caused by replication errors, 
organisms are also subject to continual assault by a variety of agents that can 
damage biological macromolecules, including DNA. These agents include 
electromagnetic radiation such as ultraviolet light, X-rays and gamma-rays; 
toxic chemicals in the environment, such as those found in cigarette smoke, 
and also chemicals generated as a by-product of normal biochemical activity, 
such as reactive oxygen species. Each of these agents produces characteristic 
types of molecular damage to DNA (often referred to as molecular /esions) 
that can affect function. These lesions are corrected by a number of different 
repair processes, many of which are able to accurately recognise and repair 
errors in one strand using the intact complementary DNA strand as a template. 
Several DNA repair processes exist in both prokaryotic and eukaryotic cells, 


Chapter 5 Genes and genomes 


and many of the proteins involved have been highly conserved in many 
different species throughout evolution. 


Base excision repair 

As its name suggests, base excision repair (BER) is a cellular mechanism 
whereby single damaged bases are removed from the DNA. Damage to bases 
is often a spontaneous event caused by normal chemical reactions in the cell. 
A common example is the deamination of cytosine: the replacement of an 
amino group (—NH2) in a cytosine base by a carbonyl group (C=O), which 
converts it into another type of base called uracil (Figure 5.7a). 


“ GE 


0 ] mutation 


Ae TnI 


N. NH 
ae first 
fe) replication 
uracil 


(a) 


HT Ht 


and and 


+4ttt 


(b) 


Figure 5.7 (a) Deamination of cytosine forms uracil, which can base-pair with 
adenine, but is not normally found in DNA. (b) If the uracil is not removed prior 
to DNA replication, it can lead to a replacement of a C:G base pair with a T:A 
base pair. 


Uracil is normally found in RNA (where it replaces thymine) but not in DNA, 


and its presence in DNA is problematic for DNA replication, since uracil pairs 
with adenine. 


157 


Generating Diversity 


deaminated C 


: Tn hydrogen boned 


uracil DNA 
glycosylase 


DNA helix with 
missing base 


removal 
of sugar— 
.S phosphate 


DNA helix 
| with single 
nucleotide gap 
G 


DNA polymerase 
adds new niiceotige, 
DNA ligase seals nick 


G 


Figure 5.8 The base excision repair (BER) 
pathway which removes damaged bases. In this 
example, deamination has converted a cytosine 
into a uracil residue. The uracil is removed, 
leaving an abasic site, which is itself cleaved 
from the DNA, leaving a short gap in one 
strand, The gap is repaired by DNA polymerase 
and DNA ligase. 


= What might be the biological consequences of conversion 
of C into U by deamination? 


oO 


C normally base-pairs with G, but U base-pairs with A, so 
replication using DNA template strands containing U in 
place of C will lead to misincorporation of A in place of 
G in the new DNA strand (Figure 5.7b). 


Two other bases, adenine and guanine, can also be affected by 
deamination. Base excision repair is initiated by the action of 
DNA glycosylases, enzymes that catalyse the cleavage of a 
base from the sugar—phosphate backbone. There are several 
different DNA glycosylases with specificity for particular 
damaged bases. In the example shown in Figure 5.8, 
deamination of cytosine has resulted in the formation of a 
uracil residue. The incorrect base is recognised by uracil DNA 
glycosylase, and cleaved from the sugar—phosphate backbone 
leaving a deoxyribose sugar lacking a base, known as an 
abasic site. In a second step, the deoxyribose sugar—phosphate 
is removed by nucleases, and the gap is repaired by the 
combined action of DNA polymerase, which adds the correct 
nucleotide, and DNA ligase, which rejoins the sugar—phosphate 
backbone, leaving no gap. 


Another common type of chemical damage is depurination, the 
complete loss of a guanine or adenine base from the sugar— 
phosphate backbone, leaving an abasic site. This type of 
damage can also be corrected by the BER pathway 

(Figure 5.8). 


UV damage and nucleotide excision repair 


The Sun’s rays include invisible ultraviolet (UV) light, which is 
categorised into UV-A, UV-B and UV-C on the basis of 
wavelength. All UV exposure causes damage, but UV-B 
(wavelength 315 nm—280 nm) is particularly damaging to 
DNA. Exposure to UV-B light leads to the formation of 
pyrimidine dimers, where adjacent cytosine or thymine residues 
undergo photochemical reactions that covalently link the two 
bases (Figure 5.9a). 


Thymine dimers interfere with normal base pairing, and also induce a 
distortion, or ‘kink’, in the DNA double helix (Figure 5.9b). These changes in 
the DNA structure prevent accurate DNA replication and transcription. 

The module learning resources Thymine dimers and similar lesions are repaired by a process known as 


include an interactive model of a 
thymine dimer, which allows you 
to view this distortion in three 


nucleotide excision repair (NER). NER is a complex process, which in 
humans relies on the products of at least 30 genes, operating in as many as 18 


dimensions. different protein complexes. 


158 


oO oO 
a 
\, 
MYaese}|| Hs 9 —> cH; 9 
a, 
CH; - CH; 
(a) 


Chapter 5 Genes and genomes 


4-carbon ring 
in thymine dimer 


Figure 5.9 (a) A thymine dimer is formed by the covalent linkage of a thymine residue with a neighbouring 


thymine, forming a 4 


highlighted in orange). 


In outline, repair of thymine dimers by NER is initiated when the distortion or 
‘kink’ in the double helix caused by the presence of the thymine dimer is 
recognised by a complex of repair proteins. The DNA strand containing the 
dimer is cleaved on either side of the lesion by nucleases, and the short 
single-stranded section containing the lesion is released by the unwinding and 
separation of the two strands by DNA helicase (Figure 5.10). This leaves a 
short gap in the strand that is filled by synthesis of new DNA using the intact 
DNA strand as a template. The gap remaining at the end of the new single- 
stranded section is finally sealed by DNA ligase. 


= What might be the consequence of UV-B exposure for the cells of an 
individual with an inherited defect in the NER repair pathway? 


Individuals incapable of repairing UV-induced damage such as thymine 
dimers would accumulate DNA damage and mutations in the cells in the 
areas of their body that are exposed to sunlight. 


Xeroderma pigmentosum (XP) is a human genetic disorder in which affected 
individuals are highly sensitive to UV-induced DNA damage due to defects in 
NER. XP patients typically suffer from extreme sunburn, damaged skin and a 
high frequency of skin tumours in areas of skin that are exposed to sunlight. 
The genetics of XP is complex: there are at least seven different types, 
referred to as XP-A through to XP-G, each caused by mutation of a gene that 
encodes one of the proteins involved in NER. You will return to a detailed 
discussion of xeroderma pigmentosum in Book 3. 


carbon ring. (b) The thymine dimer forms a kink in the double helix, cl 
the alignment of base pairs above and below the thymine dimer in this molecular model (the 4 


een by looking at 
bon ring is 


arl 


159 


Generating Diversity 


160 


Figure 5.10 The nucleotide excision repair pathway. The strand containing the 

lesion, in this case a thymine dimer, is cleaved on either side of the lesion, The 

cleaved section is detached by the action of DNA helicase, which separates base 
pairs, and the missing section is replaced by the action of DNA polymerase and 
DNA ligase. 


Repair of double-strand breaks 

As described above, MMR, BER and NER all transiently generate short 
single-stranded sections of DNA which can be efficiently and accurately 
repaired by DNA synthesis, using the other intact DNA strand as a template 
for new synthesis. In contrast, the potential consequences of a break in both 
strands, a double-strand break (DSB), can be rather more severe, and will 
result in cell death if unrepaired. 


= Why might you expect unrepaired double-strand breaks to be severely 
deleterious to the cell? 


© Unrepaired DSBs could lead to fragmentation of the chromosomes. 


Double-strand breaks are typically caused by replication errors or highly 
energetic ionising radiation, including X-rays. They can be repaired by one of 


Chapter 5 Genes and genomes 


two mechanisms. The first, which is known as non-homologous end-joining, 
involves a protein complex that simply trims the two damaged ends and 
ligates them together again (Figure 5.1la). This type of repair is very ‘error- 
prone’, because during this process several base pairs of DNA are usually lost 
before the break is repaired, and if many DSBs have been induced at the same 
time, there may be incorrect rejoining of DSBs from different places on the 
chromosome, or even between different chromosomes. 


formation of 
double-strand 
break 


removal of 
nucleotides 
from ends damaged DNA molecule 
with a double 


Tn, i strand break 


ligation of ends a a 2 undamaged homologous 
| DNA molecule 
loss of nucleotides ————— all poe ehetrants bee fer 
at join itt Sf adc undamaged homologous 
chromosome 
(a) NON-HOMOLOGOUS 
END-JOINING 


Tr rt new DNAis accurately copied 


5 using template strand 


ligase joins the new sections 
of DNA, and chromosomes 
separate 


(b) HOMOLOGOUS END-JOINING 


Figure 5.11 Repair of double-strand breaks. (a) Non-homologous end-joining is error-prone as it does not use a 
template to ensure the repair is accurate and it also involves the loss of a few nucleotides from the site of breakage. 
(b) Homologous end-joining is a more accurate mechanism for repairing double-strand breaks. The damaged strands 
pair with another intact homologous DNA molecule (the homologous chromosome in a eukaryotic cell) which then 
acts as a template for DNA repair. 


In contrast, in a second mechanism, known as homologous end-joining 
(Figure 5.11b), the damaged DNA molecule pairs with another (undamaged) 
homologous DNA molecule which acts as a template for DNA synthesis to 
repair both strands accurately. Recall that the cells of diploid eukaryotes have 
pairs of homologous chromosomes, so the other (undamaged) chromosome of 


161 


Generating Diversity 


162 


the pair can be used as a template for DNA synthesis to repair a DSB 

(Figure 5.11b). This mechanism is therefore more accurate than non- 
homologous end-joining, and is generally the cell’s preferred mechanism of 
DSB repair. This repair pathway essentially uses the same molecular 
mechanism as genetic recombination (crossing over) during meiosis 

(Section 4.3.1). Indeed, it can sometimes result in a genetic exchange between 
the two chromosomes. 


5.5 The flow of information: from DNA to 
protein synthesis 


Before going on to look at the genome composition of different organisms, 
this section will first consider the information encoded by genes, and how 
changes to the nucleotide sequence of a gene can affect its function. In 
essence, a gene is a sequence of nucleotides that encodes a product — usually 
a protein. Chapter 6 of this book deals with the process of gene expression in 
some detail, but for now gene structure will be considered in general terms, 
and the discussion restricted to protein-coding genes. Activity 5.1 (Part 2) 
continues the description of the flow of information in the cell, by describing 
the transcription of the DNA sequence to yield a messenger RNA 
intermediate, which is then translated to synthesise a polypeptide at the 
ribosome. 


(LO 5.4) Allow 30 minutes 
The second part of this activity outlines how the information encoded in DNA 
specifies the amino acid sequence of an encoded polypeptide. Again, you may 
already be familiar with this process from earlier studies, but are 
recommended to review this material before proceeding with the chapter. 


The sequence of amino acids in every protein in the cell is determined by the 
sequence of the four DNA bases — adenine (A), guanine (G), cytosine (C) and 
thymine (T) — in the gene that encodes the protein. The DNA sequence of the 
gene is first copied by RNA polymerase to produce a single-stranded 
messenger RNA (mRNA) in a process known as transcription, beginning 
and ending at specific start and termination points in the DNA (Activity 5.1, 
Part 2 and Figure 5.12). In contrast to DNA replication, only one strand of the 
DNA is used as a template for the synthesis of the mRNA. The DNA template 
strand is again read in the 3’ to 5’ direction, and the mRNA is synthesised in 
the 5’ to 3’ direction (Figure 5.12). The structure of RNA is similar to DNA; 
it has a ribose sugar backbone (whereas DNA has a deoxyribose sugar 
backbone) and while it also has four bases, these are A, G, C and U, because 
thymine (T) is replaced by uracil (U) in RNA. Uracil (like thymine) forms 
base pairs with adenine. RNA, unlike DNA, is usually single-stranded. 


The single-stranded mRNA is then used by ribosomes as the template for 
protein synthesis in the process called translation (Figure 5.12 and 


Chapter 5 Genes and genomes 


Activity 5.1, Part 2). During translation, the mRNA bases are recognised in 
groups of three, each known as a codon, which specify one of the 20 types of 
amino acid found in proteins. 


Figure 5.12 The flow of genetic information. The sequence of the DNA template 
strand of a protein-coding gene is copied into a single-stranded messenger RNA 
(transcription). At the ribosome, the mRNA is used as a template for synthesis of 
the polypeptide chain (translation). Each of the triplet codons in the mRNA 
message encodes a particular type of amino acid. 


The relationship between the triplet base codons in RNA and the amino acids 
they specify is known as the genetic code (Figure 5.13). The four bases in 
RNA can be arranged in 64 different combinations of three (there are four 
possibilities for the first base of the codon (U, C, A or G), four for the 
second, and four for the third; 4 x 4 x 4 = 4° = 64). Since there are 64 
different codons and only 20 naturally occurring amino acids, it follows that 
some amino acids have multiple codons; this is sometimes referred to as 
redundancy of the genetic code. There is also a ‘start’ codon (AUG, which 
encodes the amino acid methionine) and three ‘stop’ codons (UAA, UAG and 
UGA), which signal the termination of translation (Figure 5.13) and which do 
not encode an amino acid. You will learn about the processes of transcription 
and translation in more detail in the next chapter. 


= Using the genetic code table in Figure 5.13, what is the sequence of the 
polypeptide transcribed from the following mRNA strand: 
5' AUGGUGCAUAUUAGAUAC 3’? 


<The polypeptide sequence would be: Met Val His Ile Arg Tyr. 


163 


Generating Diversity 


first base 


Ala (A) = alanine 

Arg (R) = arginine 
Asn (N) = asparagine 
Asp (D) = aspartate 
Cys (C) = cysteine 
Gin (Q) = glutamine 
Glu (E) = glutamate 
Gly (G) = glycine 

His (H) = histidine 

lle (1) = isoleucine 
Leu (L) = leucine 

Lys (K) = lysine 

Met (M) = methionine 
Phe (F) = phenylalanine 
Pro (P) = proline 

Ser (S) = serine 

Thr (T) = threonine 
Trp (W) = tryptophan 
Tyr (Y) = tyrosine 

Val (V) = valine 


third base 


Figure 5.13 The genetic code. There are 64 possible mRNA codons (each consisting of three bases). To identify the 
amino acid coded by a particular codon, select the first base from the rows on the left, the second base from the 
columns along the top and the third base from the codons shown in the appropriate square. Note that AUG, the 
codon for Met, is also the ‘start’ codon. There are also three ‘stop’ codons. The abbreviated names and single letter 
code for the 20 amino acids found in proteins are listed (you do not need to remember these abbreviations, nor which 
codons correspond to which amino acids). 


164 


You should by now be beginning to see how certain changes (mutations) in 
the base sequence of a gene might affect the sequence of the encoded protein. 


= How might such mutations arise? 


They may arise as a result of uncorrected errors during DNA replication, 
or as a result of unrepaired damage, such as deamination of bases, 
formation of thymine dimers or double-strand breaks. 


A section of sequence that encodes a polypeptide sequence, or part of a 
polypeptide sequence is known as an open reading frame (ORF). An ORF is 
therefore a series of triplet codons, uninterrupted by a stop codon. A mutation 
that disrupts an ORF may change the protein product. In the next section you 
will study how different types of gene mutation in coding, and also non- 
coding regions may lead to an effect on the function of a gene or its 

protein product. 


5.6. The consequences of mutations 


The rate at which mutations — permanent changes in DNA sequence — 
accumulate varies between different species, and even between different 
regions of DNA in the same species. It can be as low as one in every 10'° bp 
per genome per cell division, to as high as one in every 100 bp per genome 
per cell division (which is common in unicellular eukaryotes and bacteria). 


Chapter 5 Genes and genomes 


The mutation rate can be estimated by observing the rate at which 
spontaneous changes arise in specific genes, either in a population of 
organisms, or in cells from the organism growing in culture in the laboratory. 
Application of both of these methods to an analysis of mutation rate in the 
mouse genome has suggested an uncorrected error frequency of only one base 
pair change in every 10” bp for each cell generation. The mouse genome is 
about the same size as the human genome (about 3 x 10° bp), so there are 
only a couple of new errors in each newly divided cell. 


Consequently, a typical mouse gene containing about 10° protein coding base 
pairs would suffer a mutation once in about 10° cell generations (i.e. 10° bp/ 
10° bp). A total of about 10'? cell divisions take place during the lifetime of a 
mouse. Thus, on average, every single gene would have acquired a mutation, 
somewhere in a mouse’s body, on about 10° (i.e. 10'2/10°) occasions. The 
older the mouse is, the greater the total number of mutations that will have 
accumulated in its cells. This explains why the incidence of cancer in 
mammals increases with age: cancer results from the accumulation in a cell of 
several mutations that disrupt the function of genes that control cell division 
(Book 3, Chapter 1). 


In eukaryotes, only those mutations that occur in the germ-line, the cells that 
ultimately form the gametes and therefore contribute to the next generation, 
can give rise to heritable gene variants. Chapter 4 looked at the patterns of 
inheritance of genes in a variety of biological systems. In all these cases, the 
phenotype of a particular individual is determined by combinations of gene 
variants known as alleles. But why are some alleles of a gene dominant to 
others? Why are some mutant alleles recessive to their wild-type counterpart, 
while others are dominant to the wild type? To answer these questions, you 
will now move on to look at the nature of mutations, and at the molecular 
consequences of different types of mutation. 


5.6.1 Changes in single nucleotides 


Gene mutations that result in a simple exchange of one type of base for 
another may have a very subtle effect on a gene product. In Section 5.4.2 you 
learnt about changes in individual bases in the DNA that are caused by 
spontaneous chemical modification of a base: for example, the deamination of 
cytosine to form uracil (Figure 5.7a), and depurination (loss of a guanine or 
adenine base). 


= From your study of this chapter, how can abasic sites such as those 
derived from depurination be repaired? 


ao 


Abasic sites are repaired by enzymes that cleave the sugar—phosphate 
backbone either side of the abasic site. The resulting gap is filled in by 
the action of DNA polymerase and ligase. This is base excision repair. 


If a uracil residue resulting from deamination is not recognised and removed 
before DNA replication occurs, at the second round of replication A would 
pair with T, and thus the original C:G pair would be replaced by a T:A pair 
(Figure 5.7b). 


165 


Generating Diversity 


166 


The second type of event to effectively bring about a base change would be 
the misincorporation of an incorrect nucleotide during replication. Transitions 
involve the substitution of one pyrimidine (C or T) by the other, or of one 
purine (A or G) by the other, while fransversions involve the replacement of a 
pyrimidine by a purine or vice versa. 


= Is the mutation C changed to U a transition or a transversion? 


© It is a transition, since both cytosine and uracil are pyrimidines. 


Base changes within an ORF can have a number of consequences for the 
protein product of the gene. They may cause substitution of one amino acid 
for another, known as a missense mutation (Figure 5.14a), or they may 
introduce a premature stop codon, which is known as a nonsense mutation 
(Figure 5.14b). 


= What would be the consequence of a nonsense mutation for a protein 
encoded by the mutated gene? 


© The stop codon would prevent translation of the full length of the ORF 
and the result would be a truncated protein product. 


The more subtle change caused by a missense mutation might have a drastic 
effect on protein function if it changed an essential amino acid, for example in 
the active site of a protein (Book 2, Chapter 1). On the other hand, many base 
changes cause absolutely no change to the protein sequence. These are known 
as silent mutations. 


= Looking at the genetic code in Figure 5.13, can you suggest why some 
single base changes within an ORF might have no effect on the encoded 
protein? 


© Some amino acids are encoded by several codon sequences, which often 
differ only in the third base, so changes in the third base of a codon often 
don’t alter the amino acid incorporated into the protein (Figure 5.14c). 


Errors in DNA replication can also lead to the insertion or deletion of one or 
more nucleotides, as shown in Figure 5.14d and e. Such mutations can have a 
significant effect on the gene. Recall that nucleotides are decoded as triplet 
codons (groups of three nucleotides) during translation. Imagine that an extra 
nucleotide is inserted somewhere in an ORF encoding a section of protein. 
The consequence is that the register, or reading frame, after the insertion will 
be shifted along one nucleotide, so that the codons, and hence the amino acid 
sequence encoded, from that point onwards would be completely changed, and 
the structure of the polypeptide encoded by the gene would be significantly 
altered, with probable disruption of its function. Such mutations are known as 
frameshift mutations. 


= Would the insertion of three nucleotides into an ORF cause a shift in 
reading frame? 


0 


No; because the genetic code is translated in triplets, insertions or 
deletions of multiples of three nucleotides do not cause a frameshift. 


The mutation rate can be estimated by observing the rate at which 
spontaneous changes arise in specific genes, either in a population of 
organisms, or in cells from the organism growing in culture in the laboratory. 
Application of both of these methods to an analysis of mutation rate in the 
mouse genome has suggested an uncorrected error frequency of only one base 
pair change in every 10” bp for each cell generation. The mouse genome’ is 
about the same size as the human genome (about 3 x 10° bp), so there are 
only a couple of new errors in each newly divided cell. 


Consequently, a typical mouse gene containing about 10° protein coding base 
pairs would suffer a mutation once in about 10° cell generations (i.e. 10° bp/ 
10° bp). A total of about 10! cell divisions take place during the lifetime of a 
mouse, Thus, on average, every single gene would have acquired a mutation, 
somewhere in a mouse’s body, on about 10° (i.e. 10'2/10°) occasions. The 
older the mouse is, the greater the total number of mutations that will have 
accumulated in its cells. This explains why the incidence of cancer in 
mammals increases with age: cancer results from the accumulation in a cell of 
several mutations that disrupt the function of genes that control cell division 
(Book 3, Chapter 1). 


In eukaryotes, only those mutations that occur in the germ-line, the cells that 
ultimately form the gametes and therefore contribute to the next generation, 
can give rise to heritable gene variants. Chapter 4 looked at the patterns of 
inheritance of genes in a variety of biological systems. In all these cases, the 
phenotype of a particular individual is determined by combinations of gene 
variants known as alleles. But why are some alleles of a gene dominant to 
others? Why are some mutant alleles recessive to their wild-type counterpart, 
while others are dominant to the wild type? To answer these questions, you 
will now move on to look at the nature of mutations, and at the molecular 
consequences of different types of mutation. 


5.6.1 Changes in single nucleotides 


Gene mutations that result in a simple exchange of one type of base for 
another may have a very subtle effect on a gene product. In Section 5.4.2 you 
learnt about changes in individual bases in the DNA that are caused by 
spontaneous chemical modification of a base: for example, the deamination of 
cytosine to form uracil (Figure 5.7a), and depurination (loss of a guanine or 
adenine base). 


= From your study of this chapter, how can abasic sites such as those 
derived from depurination be repaired? 


© Abasic sites are repaired by enzymes that cleave the sugar—phosphate 
backbone either side of the abasic site. The resulting gap is filled in by 
the action of DNA polymerase and ligase. This is base excision repair. 


If a uracil residue resulting from deamination is not recognised and removed 
before DNA replication occurs, at the second round of replication A would 
pair with T, and thus the original C:G pair would be replaced by a T:A pair 
(Figure 5.7b). 


Chapter 5 Genes and genomes 


165 


Generating Diversity 


166 


The second type of event to effectively bring about a base change would be 
the misincorporation of an incorrect nucleotide during replication. Transitions 
involve the substitution of one pyrimidine (C or T) by the other, or of one 
purine (A or G) by the other, while transversions involve the replacement of a 
pyrimidine by a purine or vice versa. 


= Is the mutation C changed to U a transition or a transversion? 


© It is a transition, since both cytosine and uracil are pyrimidines. 


Base changes within an ORF can have a number of consequences for the 
protein product of the gene. They may cause substitution of one amino acid 
for another, known as a missense mutation (Figure 5.14a), or they may 
introduce a premature stop codon, which is known as a nonsense mutation 
(Figure 5.14b). 


= What would be the consequence of a nonsense mutation for a protein 
encoded by the mutated gene? 


© The stop codon would prevent translation of the full length of the ORF 
and the result would be a truncated protein product. 


The more subtle change caused by a missense mutation might have a drastic 
effect on protein function if it changed an essential amino acid, for example in 
the active site of a protein (Book 2, Chapter 1). On the other hand, many base 
changes cause absolutely no change to the protein sequence. These are known 
as silent mutations. 


= Looking at the genetic code in Figure 5.13, can you suggest why some 
single base changes within an ORF might have no effect on the encoded 
protein? 


© Some amino acids are encoded by several codon sequences, which often 
differ only in the third base, so changes in the third base of a codon often 
don’t alter the amino acid incorporated into the protein (Figure 5.14c). 


Errors in DNA replication can also lead to the insertion or deletion of one or 
more nucleotides, as shown in Figure 5.14d and e. Such mutations can have a 
significant effect on the gene. Recall that nucleotides are decoded as triplet 
codons (groups of three nucleotides) during translation. Imagine that an extra 
nucleotide is inserted somewhere in an ORF encoding a section of protein. 
The consequence is that the register, or reading frame, after the insertion will 
be shifted along one nucleotide, so that the codons, and hence the amino acid 
sequence encoded, from that point onwards would be completely changed, and 
the structure of the polypeptide encoded by the gene would be significantly 
altered, with probable disruption of its function. Such mutations are known as 
frameshift mutations. 


= Would the insertion of three nucleotides into an ORF cause a shift in 
reading frame? 


© No; because the genetic code is translated in triplets, insertions or 
deletions of multiples of three nucleotides do not cause a frameshift. 


Chapter 5 Genes and genomes 


RNA polymerase or a regulatory protein) may reduce the amount of 
functional protein in the cell. 


In addition, large chromosome rearrangements (Section 5.11) occasionally 
relocate the gene to so-called ‘silent’ chromosome regions that are 
unfavourable for transcription, such as heterochromatin (Section 3.4.3). 
Insertion of a large DNA fragment can also disrupt transcription of a gene and 
often leads to a null mutation; this is typical of mutations caused by the 
mobilisation of transposable elements (Section 5.8.2). 


In contrast, mutant alleles that are dominant over wild type generally represent 
a change in function, or an increase in activity of the gene product. Gain of 
function mutations (sometimes referred to as hypermorphs) may be due to: 


* increased activity of the gene product 

e increased levels of transcription 

* inappropriate patterns of gene expression, which may be temporal 
(i.e. when a gene is transcribed) or spatial (where a gene is transcribed); 
inappropriate gene expression can have drastic effects on the development 
of an organism 

¢ large chromosomal rearrangements that move the gene’s coding region into 
proximity with DNA transcriptional control sequences that normally 
activate another gene. 


Very rarely, a mutation can lead to a gain of gene function that is completely 
different from the original function (referred to as a neomorph). A new 
function may result from fusion of the coding regions of two different genes, 
or a sequence change that leads to the production of an aberrant protein with 
new activity. 


It would be a mistake, however, to think that genetic mutation is an ‘all or 
nothing’ phenomenon — different mutations may lead to a subtly graded series 
of phenotypes. To return to the example of the Drosophila gene white, a great 
number of different mutant alleles have been characterised in the 100 years or 
so since the first mutant allele was identified. The phenotypes caused by these 
alleles range from the pure white eyes of white alleles that cannot express any 
functional protein at all to those with only a very slight effect on the normal 
brick-red eye colour, 


Summary of Sections 5.4 to 5.6 


¢ Uncorrected changes in DNA lead to mutations, permanent changes in 
DNA sequence that will be copied and passed to daughter cells when the 
cell divides. Errors resulting from replication and DNA damage are 
repaired by a variety of DNA repair pathways, many of which rely on the 
undamaged DNA strand to provide a template for an accurate repair. 

¢ Groups of three nucleotides (codons) in the DNA or mRNA sequence (the 
genetic code) specify the order in which amino acids will be added to the 
polypeptide chain when an mRNA sequence is translated. 

¢ Mutations that change, insert or delete bases may affect the amino acid 
sequence of an encoded protein, or may alter non-coding sequences that 


169 


Generating Diversity 


170 


affect gene expression. In eukaryotes, most mutations have no effect 
because they occur in regions of the genome that do not encode (or 
regulate) proteins or other functional gene products. 


« Mutations may create recessive (usually complete or partial loss of gene 
function) or dominant (usually gain of function or novel function) alleles 
of a gene. 


5.7 Genes and genomes in prokaryotes 


The early research in classical genetics was driven largely by work on 
multicellular eukaryotes like Drosophila. By contrast, the early advances in 
the understanding of the molecular nature of genes and their regulation were 
made through the study of prokaryotic organisms. Principal among these were 
E. coli and the bacteriophage (viruses of bacteria) that infect it. The discussion 
of gene and genome structure will therefore begin by considering prokaryotic 
genomes, 


5.7.1 Prokaryotic gene structure 


The genes of prokaryotes are simpler in structure than those of eukaryotes. In 
general, the nucleotide sequence that comprises the protein-coding region of a 
prokaryotic gene is an uninterrupted series of codons, beginning with a codon 
for methionine (ATG) and terminating with one of the stop or termination 
codons (TAA, TAG or TGA). Translation of the mRNA begins at the 
methionine codon, and continues until the stop codon is reached: there are no 
other ‘punctuation’ signals within the prokaryotic ORF. 


The ORF is typically preceded by a regulatory region containing DNA 
sequences that regulate the level of gene expression, and which you will study 
in more detail in Chapter 6. The regulatory region of a prokaryotic gene 
includes DNA sequences which provide the binding site for the enzyme RNA 
polymerase, which synthesises mRNA, and also sequences to which regulatory 
proteins bind. These regulatory proteins interact with the RNA polymerase to 
increase or decrease the efficiency of gene transcription, and are often 
exquisitely sensitive to the cell’s physiology. The ability of regulatory proteins 
to bind DNA often depends on the availability of nutrients in the environment, 
or molecules required in metabolic pathways within the cell. This allows the 
cell to rapidly alter gene expression in response to its requirements. 


Genes in prokaryotes are frequently arranged in units called operons. An 
operon is a group of genes with related function that are under the control of 
a single regulatory DNA region (Figure 5.15); the genes are transcribed 
together as a single mRNA called a polycistronic transcript (from cistron, an 
alternative word for gene), from which each of the individual proteins are then 
translated. This contrasts with eukaryotes, in which each individual gene 
usually has its own regulatory region. 


One classic example of a prokaryotic operon, and one that is particularly well 
understood is the /ac operon in E. coli. The lac operon includes three ORFs 
that are transcribed as a single polycistronic mRNA. The three ORFs of the 
lac operon encode enzymes that are related to the use of the sugar lactose as a 


Figure 5.15 A bacterial operon, a group of genes controlled by one regulatory 
region. The resulting polycistronic transcript shown here is translated to produce 
three different gene products (proteins). 


carbon source. lacZ encodes a B-galactosidase (an enzyme that converts the 
disaccharide lactose into the monosaccharides glucose and galactose); lacY 
encodes a protein required for the import of lactose into the cell; while JacA 
encodes an enzyme that is not strictly essential for lactose metabolism. The 
Jac regulatory region ensures that all three genes are expressed only when 
lactose is available. You will look at the control of the /ac operon in more 
detail in Chapter 6. 


5.7.2 Prokaryotic genomes and plasmids 


Most prokaryotes have a single circular chromosome; that of an £. coli 
bacterium is about 4.6 x 10° bp in total. While prokaryotic DNA is not 
packaged with protein to the same extent as eukaryotic DNA (see below), it is 
not simply a loose circle of DNA. In fact, the prokaryotic chromosome is 
compacted by twisting of the DNA duplex, rather like coiling up a cable. 
Enzymes called topoisomerases (which you have already encountered in the 
discussion of DNA replication in Section 5.3 above) act to add or remove 
these twists. The torsional stress of coiling the circular DNA molecule causes 
it to wind up on itself (Figure 5.16) which is referred to as supercoiling. 


The genome of a typical prokaryote has very little ‘non-gene’ DNA; it 
consists of genes and operons separated by very short stretches of DNA. This 
is in sharp contrast with the much larger genomes of eukaryotes, where in part 
the large size is due to the much wider spacing between genes. 


In addition to the genomic chromosomal DNA described above, most 
prokaryotes have separate extrachromosomal elements, known as plasmids. 
Plasmids are small circular genetic elements (also known as episomes) that 
can replicate independently from the genomic chromosome. In many ways, 
plasmids function as small independent genomes: they generally have a single 
origin of replication, and their replication may be linked to the cell’s division 
cycle. Plasmids often carry a small number of genes associated with a specific 
set of functions. For example, plasmids often encode proteins that facilitate 
their own movement from cell to cell. F plasmids (named F for fertility factor) 
encode genes required for the bacterium to form sex pili — cell surface 


Chapter 5 Genes and genomes 


171 


Generating Diversity 


172 


Figure 5.16 Bacterial chromosomes are compacted by supercoiling. Typically, 
prokaryotic genomes are circular DNA molecules which are supercoiled by the 
introduction of twists into the double helix. 


structures through which the plasmids, and in some cases chromosomal DNA, 
are transmitted from one cell to another during conjugation. This is one of a 
number of mechanisms of horizontal gene transfer (Section 5.7.5), which is a 
major source of microbial diversity in the wild. 


The spread of plasmids through bacterial populations can also have an impact 
on human health. Some plasmids carry genes encoding toxins; the 
enterohaemorrhagic E. coli 0157:H7 carries the pO157 virulence plasmid 
which encodes a potent toxin that causes diarrhoea. Plasmids may also encode 
virulence factors which increase the infectivity or harmful effects of 
pathogens, for example proteins that help bacteria to adhere to other cells, or 
to penetrate cell membranes. Plasmids have a role in the spread of antibiotic 
resistance genes in bacteria, perhaps most notoriously the genes for antibiotic- 
inactivating enzymes such as the B-lactamases which hydrolyse penicillin-like 
antibiotics, preventing them from binding to their target bacteria. Plasmids 
have proven invaluable in biological research as, among other things, they 
made possible the technique of gene cloning (Box 5.2). 


organism, so a DNA molecule derived from one species has the capacity 
to be replicated, transcribed and translated in a second species. In order 
to clone genes, three technical advances were necessary. The first 
requirement was a reproducible means of cleaving DNA into defined 
sections. This was enabled by the discovery of restriction 
endonucleases, enzymes that catalyse the cleavage of isolated DNA 
molecules at specific sequences in vitro. The second advance was the 
development of techniques for joining together sections of DNA. Joining 
DNA fragments in vitro is accomplished by the action of DNA ligase. 


= What processes in living cells require DNA ligase? 


Joining Okazaki fragments during lagging-strand DNA synthesis, 
and several of the mechanisms that repair DNA (Sections 5.3 and 
5.4, respectively). 


Finally, a means of propagating multiple copies of a specific DNA 
fragment was required. This can be accomplished by inserting the DNA 
fragment into a DNA molecule that has the capability to replicate 
independently within an appropriate host cell, and can be engineered in 
vitro to possess features that facilitate this function. Such molecule are 
generally referred to as cloning vectors and a large range of vectors 
suitable for propagation in different types of cells have been developed 
from naturally occurring plasmids, and also bacteriophage and virus 
genomes. The process of gene cloning using a bacterial plasmid vector is 
summarised here. 


Restriction endonucleases (often referred to as restriction enzymes) 
cleave DNA strands at specific sequences. The majority of restriction 
enzymes are isolated from prokaryotes, where their natural function is to 
cleave any foreign DNA, such as bacteriophage DNA, that enters the 
cell, thereby evading infection. There are several different types of 
restriction enzyme, but the most useful in gene cloning are those that 
recognise a specific short DNA target sequence, and cut both strands of 
the DNA duplex. The target DNA sequence is usually palindromic (that 
is, it has the same sequence read in both directions), and in the case of 
the most widely used restriction enzymes, the two DNA strands are 
cleaved in a staggered fashion (Figure 5.17). 


The important thing to note from Figure 5.17 is that each of the enzymes 
shown cleaves DNA to yield DNA fragments that have ends, or termini, 
with overhanging single strands of DNA. Furthermore, each of these 
enzymes always yields DNA with single-stranded ends of the same 
sequence, which, because of the palindromic nature of the target 
sequence, are complementary. Such complementary termini are called 
cohesive (or ‘sticky’) ends because they will base-pair with each other, 
and can be permanently rejoined in vitro using the enzyme DNA ligase 
to link the sugar—phosphate backbone of the fragments. It is the 
specificity afforded by base pairing between complementary termini, and 
the ease of joining then back together, that is the key to gene cloning 
techniques. Essentially, plasmid vector DNA is cleaved with a restriction 
enzyme to open up the circle, and mixed with the DNA fragment of 


Chapter 5 Genes and genomes 


173 


Generating Diversity 


174 


interest — derived by cleavage with the same restriction enzyme. By 
incubating this mixture in the presence of DNA ligase, recombinant 
DNA molecules in which the DNA fragment of interest has been inserted 
into the circular plasmid can be obtained (Figure 5.18). 


BamHI target 


oS 


5'---Gcatcc---3" 
3'---CCTAGG-—-5§' 


EcoRI target 
uence 
Peewee 


5'---GaaTTC--—-3" 
3'---CTTAAG——-5' 


BamHl EcoRI 
cleavage cleavage 
5'=--G Gatcc ---3' 5'---G aarTc ---3" 
3'---CCTAG G---5' 3'---CTTAA G---5' 
(a) (b) = 


Figure 5.17 Restriction endonucleases used for gene cloning can produce 
protruding single-stranded ‘sticky’ termini: for example, (a) BamHI and 
(b) EcoRI. The blue arrows indicate where the DNA strands are cleaved in 
the specific target sequences. 


The final stage in DNA cloning is the propagation of the DNA fragment 
of interest. Typically, this is done within cells of the bacterium E. coli, 
and takes advantage of the plasmid vector’s ability to replicate within a 
bacterial cell. In the laboratory, living E. coli cells can be induced by 
certain treatments to take up the recombinant plasmid DNA in a process 
known as transformation: such cells are referred to as competent cells 
(Section 5.7.5). Once inside the cell, the recombinant plasmid DNA 
replicates and provides a source of large quantities of the DNA fragment 
of interest. 


Figure 5.18 Flowchart of DNA cloning. (1) A cloning vector, in this case a 
plasmid (such as that shown in Figure 5.19), is cleaved with a restriction 
endonuclease, as is a preparation of DNA to be cloned. (2) The two DNA 
preparations are mixed to allow the cohesive termini of the DNA fragments 
to anneal. DNA ligase rejoins the sugar—phosphate backbone of the DNA 
molecules. Note that some re-ligated plasmid molecules will not contain the 
DNA of interest. (3) E. coli cells that have been pretreated to enable them to 
take up DNA are transformed with the ligated DNA, and propagated on agar 
plates containing the antibiotic ampicillin. Only the cells that contain 
plasmid can grow. If the agar plates are also supplemented with an 
appropriate chromogenic substrate for B-galactosidase (see text below), cells 
containing plasmids in which /acZ is interrupted by the insertion of a DNA 
segment will form white colonies, while cells containing plasmids with no 
insert (and therefore a functional /acZ) will form blue colonies. 


Chapter 5 Genes and genomes 


175 


Generating Diversity 


176 


Figure 5.19 A typical plasmid cloning vector. This circular molecule 
contains a replication origin (co/E1) and a gene for ampicillin resistance 
(amp*). The block marked MCS (multiple cloning site) contains the 
recognition sequence for several restriction enzymes. The MCS is located 
within the E. coli lacZ gene. 


These include: 

« An origin of replication. The example in Figure 5.19 has a col/E1 
origin of replication (derived from a naturally occurring E. coli 
plasmid). 

« An antibiotic resistance gene. Because transformation is inefficient, 
only a small proportion of cells will take up the plasmid, so a means 
of selecting those cells that have successfully taken up the 
recombinant plasmid is needed. For example, by including a gene 
that confers resistance to the antibiotic ampicillin in the plasmid 
vector (amp* in Figure 5.19), only those cells that have successfully 
taken up the plasmid will grow on selective medium containing 

~ ampicillin, while cells that do not harbour the plasmid vector will fail 
to grow (Figure 5.18). 

« A choice of several different restriction enzyme sites at which DNA 
may be inserted (as shown in Figure 5.19). To enhance versatility, 
most modern vectors can accept DNA obtained by cleavage with any 
of a number of different restriction enzymes. The advent of the 
polymerase chain reaction (Box 5.1) has in many cases simplified the 
isolation of specific DNA sequences for gene cloning. 


«© A means of distinguishing recombinant plasmids, which contain an 


. inserted DNA fragment, from recircularised ‘empty’ vector plasmid. 
For example, the plasmid shown in Figure 5.19 bears the restriction 
enzyme cloning sites within a copy of the E. coli lacZ gene that 
encodes -galactosidase (Section 5.7.1 above). When no DNA is 
inserted in the vector plasmid, the JacZ gene is expressed, and the 
cell carrying the plasmid will synthesise B-galactosidase enzyme, 


5.7.3. Prokaryotic DNA replication 


Earlier in this chapter, you learnt how DNA molecules were replicated. As 
you might imagine for a process that is so vitally important, the enzymes and 
other proteins involved in DNA replication are well conserved, that is, they 
have a very similar amino acid sequence in most species, from prokaryotes to 
eukaryotes. In general, prokaryotic chromosomes have a single point at which 
DNA replication is initiated, known as the origin of replication (Section 5.3). 
Replication proceeds bidirectionally from the origin of replication, as shown 
in Figure 5.20a. The initiation of replication is tightly controlled; it is linked 
to cell size and occurs once per cell division, although in rapidly dividing 
bacterial cells, a second round of replication may be initiated before the first is 
complete. 


Plasmids in Gram-negative bacteria usually replicate bidirectionally in much 
the same way as a bacterial chromosome (Figure 5.20a), while those in Gram- 
positive bacteria usually replicate by another mechanism called rolling circle 
replication (Figure 5.20b). Linkage of plasmid replication to the host cell’s 
cell division cycle is not always very strict, so there may be quite wide 
variation in copy number present in cells. Indeed, many plasmids used for 
gene cloning in the laboratory have been deliberately selected to have a very 
high copy number, in order to maximise the yield of recombinant DNA or of 
the recombinant proteins encoded by the plasmid. 


5.7.4 Prokaryotic genomes are similar to eukaryotic 
organelle genomes 


In Section 1.2.4, you encountered the endosymbiotic theory of the origin of 
mitochondria and chloroplasts, the organelles responsible for ATP synthesis 
and photosynthesis, respectively, in eukaryotes. Alone among the extranuclear 
organelles, mitochondria and chloroplasts possess their own genomes, which 
resemble prokaryotic genomes in that they are small and circular (Figure 5.21) 
and the DNA is not packaged with proteins in the same way as a typical 
eukaryotic chromosome. 


Chapter 5 Genes and genomes 


177 


Generating Diversity 


origin replication in 
circular both directions 
enremezome’ 
~GQre = 


f] DNA synthesised from 3’ end 
of the nicked strand 


t\ 


circle ‘rolls’ displacing 
Vv original strand 


many plasmid ‘lengths’ synthesised 


plasmid genomes separated 


circularisation and 

ligation; second 

strand synthesis 
(b) if required 


Figure 5.20 Replication in prokaryotes. (a) Bidirectional replication of a circular prokaryotic genome. Two 
replication forks proceed in opposite directions from a single origin of replication, and converge at a point opposite 
the origin. Many plasmids also replicate by this mechanism. (b) The rolling circle mechanism of plasmid replication, 
typical of Gram-positive bacteria. Replication begins at the origin where one strand is nicked, providing a starting 
point for replication of the new DNA strand around the circle. Many copies of the template are made as one long 
strand and cut into individual genomes which then recircularise and provide a template for synthesis of a second 
DNA strand to give new double-stranded plasmids. 


Mitochondrial and chloroplast genomes are substantially smaller than that of a 
‘typical’ prokaryote such as £. coli. In fact, the great majority of the proteins 
in mitochondria and chloroplasts are actually encoded by the nuclear genome 
and imported into the organelle (Section 3.4); only a few are encoded in the 
organelle genome. The mitochondrial genome is particularly compact 

(Figure 5.21), with virtually no non-coding DNA (other than in the region 


178 


16S rRNA 


{i 2 genes encoding rRNA 
i 22 genes encoding tRNA 


NADH 
13 genes encoding 
aenydopen ass an protein components 


Figure 5.21 The human mitochondrial genome is about 1.6 10* base pairs, and 
contains 37 genes encoding some proteins, transfer RNAs (tRNAs) and ribosomal 
RNAs (rRNAs). 


containing the replication origin) and in several cases, the open reading frame 
of one gene overlaps with that of the next. The mitochondrial genome encodes 
mitochondrial transfer RNAs, ribosomal RNAs, and a number of protein 
components of the electron transport chain required for ATP synthesis 

(Book 2, Chapter 3). 


The genetic code shown in Figure 5.13 is universal for the nuclear genomes 
of all taxonomic groups. However, mitochondrial genomes deviate from this 
universal code. These differences mostly affect the start and stop codons 
(Table 5.1). 


Table 5.1 Differences between the ‘universal’ genetic code and the mammalian 
mitochondrial genetic code. 


Encoded amino acid 


Codon Universal genetic code Mitochondrial genetic code 
UGA stop tryptophan 

AGA, AGG arginine stop 

AUA isoleucine methionine 


Chapter 5 Genes and genomes 


179 


Generating Diversity 


Figure 5.22 This electron 
micrograph shows three bacterial 
cells during conjugation. Sex 
pili, through which DNA is 
transferred from one cell to 
another, are clearly visible. 


180 


5.7.5 Prokaryote genetics 


Prokaryotes pass on their genes to their daughter cells through asexual binary 
fission, and the daughter cells are genetically identical clones of the parent 
cell (Activity 4.1). Bacteria and other prokaryotes do not have an equivalent 
to the sexual reproduction of eukaryotes (involving the production of gametes 
which fuse to produce offspring with new combinations of genetic material, 
Section 4.3.2). However, there are three possible mechanisms that allow 
horizontal gene transfer between two prokaryotic cells and which contribute 
to variation in prokaryotic populations. 


Conjugation 


During conjugation, one bacterium extends a tubular structure known as the 
sex pilus towards another cell (Figure 5.22), and initiates the transfer of DNA 
through the sex pilus to the recipient cell. In £. coli, conjugation requires the 
cell to contain the F plasmid. F plasmids are quite variable in structure, but 
typically contain an origin of replication, an origin at which DNA transfer is 
initiated (known as oriT) and a set of tra genes that are required to form the 
sex pilus and initiate DNA transfer. Escherichia coli strains containing F 
plasmids are referred to as F” strains and those without as F” strains. During 
conjugation, a single strand of DNA is copied from the F” plasmid and passes 
through the sex pilus to the recipient F cell, which therefore acquires a copy 
of the F plasmid (Figure 5.23a). 


In some bacterial strains, F plasmids can insert or integrate into the bacterial 
chromosome (Figure 5.23b). The integrated plasmid’s oriT retains its activity, 
and upon conjugation it leads transfer of the bacterial chromosome through 
the sex pilus to the recipient cell where recombination can result in 
incorporation of donor strain genes into the chromosome of the recipient cell. 
Strains in which an F plasmid has become integrated in the chromosome are 
known as Hfr (high frequency of recombination) strains. The amount of 
chromosomal DNA that is transferred during conjugation depends on how 
long transfer can be maintained between the bacteria: sex pili are quite fragile 
and transfer is usually terminated by physical breakage of the pilus. 


F* onT & 


sex pilus 
contacts recipient 


oso ete 
Brel replication 

<* = CO 
plasmid transfer 


sex pilus breaks; 
transfer interrupted 


| 
Fr Ft homologous 


sequences 
line up 
{] recombination 


two F* bacteria 
(a) 


(b) 


Figure 5.23 Conjugation in E. coli. (a) Cells carrying F plasmids form sex pili 
through which a copy of the F plasmid may be transferred to cells lacking F 
plasmids (F ). (b) Hfr strains contain an F plasmid that has become integrated 
within the chromosome. During conjugation, DNA transfer initiates at the F 
plasmid’s oriT and because the plasmid is integrated in the chromosome, the 
transfer includes all or part of the bacterial chromosome. 


The transfer of chromosomal DNA by conjugation with an Hfr strain can be 
exploited in the laboratory to generate genetic recombination maps of the 
bacterial chromosome by ‘interrupted mating’ experiments (Figure 5.24). Cell 
suspensions from two cultures, one an Hfr strain and the other an F strain 
(with different genotypes) are mixed and left to conjugate. Samples are 
withdrawn at several time points, shaken vigorously to disrupt the sex pili and 
terminate conjugation, and plated on solid media. The recipient F’ cell may, 
for example, be mutant for several genes that encode enzymes required for the 
synthesis of essential substances (such as amino acids or nucleotides), so the 
cell is unable to grow on a ‘minimal’ medium lacking those substances (such 
mutants are known as auxotrophs). Conjugation with an Hfr strain that is wild 
type for these genes can transfer functional gene alleles to the F_ recipient cell 
by recombination into the recipient’s genome, restoring these activities. The 
cells resulting from conjugation are then able to grow on minimal medium. 
The map is compiled by noting the time point at which transfer of wild type 
alleles from the Hfr strain restores the phenotype of the F recipient. It is for 
this reason that map positions in the E. coli genetic map are given in units of 
minutes. This type of recombination mapping in prokaryotes has, like the 


Chapter 5 Genes and genomes 


181 


Generating Diversity 


recombination maps in eukaryotes (Section 4.5), now largely been superseded 
by DNA sequence analysis. 


oriT 


recipient F~ 


conjugation 


@) 

s 

g ‘i 
8 

ie 2 
ges 

825 N 
Cae 


10 20 30 40 50 60 


) time at which conjugation was interrupted/ minutes 


Figure 5.24 Genetic mapping in E. coli using interrupted-mating conjugation. 
(a) The Hfr donor strain (genotype A, B, N, R) can conjugate with a recipient F 
cell (genotype a, b, n, r), during which the F plasmid integrated in the bacterial 
genome of the Hfr strain initiates the transfer of chromosomal DNA. Transfer 
continues until the sex pilus breaks (usually by physical shearing). (b) The 
appearance of new genotypes in the recipient depends on the gene distance from 
the F plasmid insertion site. In this example, gene A transfers before B, which in 
turn transfers before N, and R transfers last. 


Transformation 

Bacteria are also able to take up ‘naked’ DNA from their surroundings and 
incorporate it into their genome by genetic recombination, a process known as 
transformation. This process occurs naturally in some bacteria, such as 
Streptococcus pneumoniae and Haemophilus influenzae, but most, including 


182 


E. coli, will only take up DNA if they are treated with metal ions such as 
calcium, which increase the permeability of their outer membrane; cells 
treated in this way are said to be competent to take up DNA. Other techniques 
that alter the permeability of cell membranes include electroporation, in which 
cells are exposed to pulsed electric fields. Artificially induced transformation 
is much less efficient than natural transformation, but if sufficiently high- 
concentrations of DNA are used, it works adequately for experimental 
purposes. 


Transformation can also be exploited for gene mapping. Competent cells will 
take up randomly sheared genomic DNA fragments, and if two genes are 
located near to each other on the bacterial chromosome, it is likely that some 
DNA fragments will contain both genes. The frequency of cotransformation 
for two genes that are close together would therefore be considerably higher 
than for two genes that are far apart. By conducting many such experiments, a 
genetic map based on cotransformation frequencies may be assembled. 
Transformation has, however, found its greatest use in gene cloning (Box 5.2), 
as a way of introducing recombinant plasmid DNA into host E. coli cells. 


Transduction 


Transduction is the transfer of DNA from one cell to another mediated by a 
virus. When bacteriophage infect a bacterial cell, they use the host cell’s 
replication, transcription, and translation machinery to make new viral 
genomes and viral coat proteins. The viral genomes are packaged into a 
capsid of viral proteins to form new infectious particles (Section 5.9). Some 
bacteriophage are able to integrate their genome into the host chromosome, 
and there remain dormant for some time (this is called /ysogeny), but when 
the virus is activated to start proliferating, it excises from the bacterial 
genome, often accidentally carrying with it sections of the bacterial 
chromosome, which become copied and packaged into the new virus particles. 
This type of gene transfer has contributed to the transfer of antibiotic 
resistance and virulence factors between bacterial strains. 


Summary of Section 5.7 


¢ The typical prokaryotic gene includes a continuous protein-coding open 
reading frame (ORF), with associated regulatory sequences to which RNA 
polymerase and various regulatory proteins bind. 

¢ The genes in prokaryotes are often grouped in operons, which enable 
coordinately regulated expression of genes with related function. 

¢ Typically, prokaryotic genomes are small and circular. Replication of the 
prokaryotic genome is initiated from a single origin of replication. 

¢ Chloroplast and mitochondrial genomes share features with prokaryotic 
genomes (they are circular, and not compacted into chromatin), which is 
thought to reflect their origins in endosymbiosis. 

¢ Plasmids are small circular DNA molecules that can replicate 
independently of the genomic DNA. 


Chapter 5 Genes and genomes 


183 


Generating Diversity 


184 


« Prokaryotic genomic DNA can undergo horizontal gene transfer between 
cells by the mechanisms of: conjugation (involving physical contact); 
transformation (involving transfer of naked DNA); or transduction 
(involving bacteriophage). These mechanisms contribute to variation in 
prokaryote populations and also the spread of antibiotic resistance and 
virulence factors. 


5.8 Eukaryotic genes and chromosomes 


In contrast to prokaryotes, the genes and genomes of eukaryotes are generally 
far larger and more complex. In the early days of molecular biology, the 
prevailing expectation was that eukaryotic genome size would reflect the 
number of genes. However, it was quickly realised that genome size did not 
correlate with perceived organismal complexity; for example, many 
amphibians and flowering plants have genomes significantly larger than those 
of other multicellular organisms such as reptiles and mammals (Figure 5.25). 


mycoplasma a 
x 


Gram-positive bacteria 
Gram-negative bacteria| 


fungi / moulds - | 
a 


algae 


‘worms 
crustaceans 


echinoderms 
insects 


molluscs 

birds 

bony fish 
cartilaginous fish 
reptiles 


| 


mammals 


amphibians 


flowering plants 


10° 107 10° 
genome size in base pairs 


Figure 5.25 A chart showing the variation in genome sizes across several 
taxonomic groups. Note the x-axis is a logarithmic scale (each interval is ten times 
greater than the previous one). Most mammalian genomes are around 3 10° bp. 


The extensive variation in eukaryotic genome size was referred to as the 
C-value paradox (C being the amount of DNA in the genome). Subsequently 
the data from a number of eukaryotic genome sequencing projects revealed 
that gene numbers did not correlate with genome size, and were usually 
somewhat lower than expected. Early predictions of the number of human 


Chapter 5 Genes and genomes 


genes were in the hundreds of thousands or even millions, but the completed 
genome sequence suggests that there are only around 20 000-25 000 genes, a 
rather surprising figure given the rather large 3 x 10° bp haploid genome size! 
In comparison, the fruit fly Drosophila has around 14 000 genes in its 

1.7 x 10° bp haploid genome, while the worm C. elegans has around 19 000 
genes in its 10° bp genome. In fact, the human genome isn’t unusually large. 
Figure 5.25 shows the range of genome sizes in some different taxonomic 
groups. What then is the explanation for the large size of eukaryotic genomes 
in comparison with the number of genes? 


5.8.1 Eukaryotic genes are more complex than those 
of prokaryotes 


The first point to consider is size of a typical eukaryotic gene. A schematic 
view of gene structure in eukaryotes is shown in Figure 5.26. In general, 
eukaryotic genes are not organised into operons; instead, each gene has its 
own regulatory DNA sequences. Another striking difference from prokaryotic 
genes is that the protein-coding region of a eukaryotic gene is usually 
interrupted by intervening non-coding sequences called introns; the coding 
sections are known as exons. Introns are removed by a post-transcriptional 
process known as mRNA splicing, to yield the mature messenger RNA 
(mRNA) in which the exons are contiguous and form the ORF required for 
protein translation (Chapter 6). 


transcription start 


a ae 

~50 000 be io 

distal ? proximal TATA box transcription 
regulatory promoter termination 
region sequences 


mature mRNA 


Figure 5.26 Structure of a typical eukaryotic gene. The coding region shown has five coding exons (dark purple) 
with intervening non-coding regions (introns, light purple). About 30 base pairs (-30 bp) ‘upstream’ of the 
transcription start site is the TATA box (red) where RNA polymerase binds, and further upstream (—250 bp) is a 
proximal promoter region (pink) where protein factors that regulate transcription bind. There may also be more distal 
(remote) regulatory regions many thousands of bases upstream (or sometimes downstream) of the coding region to 
which further regulatory factors bind. Downstream of the coding region are sequences that regulate transcriptional 
termination (green). 


Introns may be very long, and very numerous. For example, the human gene 
that encodes a protein called dystrophin, which is mutated in the genetic 
disorder Duchenne muscular dystrophy, is the longest known human gene. It 
is about 2.4 million (2.4 x 10°) bp in length with 79 exons, but its mature 
transcript after splicing is only 14 kilobases (1.4 x 10* bp) long. The gene is 
so large that it is estimated that it takes 16 hours to transcribe — a particularly 
lengthy process compared with the duration of a typical cell division cycle 
(which in a cultured mammalian cell is about 20 hours). The existence of 


185 


Generating Diversity 


186 


introns permits another type of subtle regulation of gene activity: in addition 
to regulation at the level of transcription, several different forms of a protein 
may be derived from a single gene by splicing together different combinations 
of exons (Chapter 6). 


There is a wide variety of regulatory DNA sequences both upstream and 
downstream of, and even sometimes within, a eukaryotic gene. These elements 
typically include the promoter region, which lies just upstream of the 
transcriptional start of the gene, where transcription factors assemble to 
promote binding of the RNA polymerase. There are also numerous more 
distant regulatory elements which act in concert to regulate gene expression in 
response to various types of signal that the cell receives from its environment 
(Chapter 6). 


Multiple-copy genes 


Not all eukaryotic genes are present in just one copy per haploid genome: 
some genes, particularly those that encode cellular components required in 
large quantities, may be present in multiple copies. Examples include genes 
for the histone proteins involved in the packaging of DNA into chromatin, and 
the genes encoding the RNA components of the ribosomes (rRNAs), which 
are present in multiple copies clustered together in large arrays 

(Section 3.4.3). Such multiple-copy genes generally display very little 
sequence variation between individual gene copies, implying that mechanisms 
exist to prevent the accumulation of mutations in these important genes. 


Other multiple-copy genes may be members of gene families — genes with 
related but often distinct roles, which have similar sequences reflecting their 
shared evolutionary origin. Examples of gene families and their origins are 
discussed later in this section. 


5.8.2 Eukaryotic genome composition 


Figure 5.27 illustrates the composition of the human genome. Notice that only 
a small proportion of the DNA in the human genome, about 1.5%, indicated 
by the dark purple segment, encodes protein or functional RNA coding 
sequence. From the description of eukaryotic gene structure above, you will 
already appreciate that the overall size of a gene can be very much larger than 
its protein-coding capacity would indicate, due to the presence of introns and 
the extensive sequences that play an important role in gene regulation. These 
sequences comprise about 24% of the genome and are collectively indicated 
by the pale purple segment in the figure. 


Perhaps what is more surprising is the large proportion of the genome that is 
composed of repeated DNA sequences. Some of this (15.5%) comprises 
multiple repeats of short sequences, which are often clustered in a particular 
chromosomal location. The rest of the repetitive DNA (44%) is more complex 
and has a more variable distribution. This category largely consists of 
transposable elements, or transposons (described below); these are DNA 
sequences that, in certain circumstances, are able to excise and reintegrate 
elsewhere in the genome (see below). The remainder of the genome comprises 
unique non-coding sequences of unknown function; however, at least some of 


introns and regulatory 
sequences 
24% 


repetitive DNA, including 
transposable elements 


unique non-coding DNA 
of unknown function 
15% 


repetitive DNA, other than 
transposable elements 
15.5% 


exons (protein, rRNA or 
tRNA coding) 1.5% 


Figure 5.27 A pie chart showing the estimated composition of the human genome 
(total about 3 x 10° bp). Only about 1.5% of the total DNA content actually 
encodes protein or functional RNA sequences. 


these sequences have been conserved between species over millions of years 
of evolution, and some of them are transcribed, suggesting that rather than 
being ‘junk’ DNA they have as yet undiscovered functions. 


Transposable elements 


So far in this book, genes have been considered as having a fixed locus 
(chromosomal location). However, some DNA sequences are not always found 
in the same position in individuals of the same species, but have the 
remarkable property of being mobile; that is, they can move from one locus to 
another within a genome. Essentially, these mobile elements can be thought of 
as molecular ‘parasites’ which often appear to have no specific function other 
than to maintain themselves. Such transposable elements (also referred to as 
transposons) are widespread, and appear to exist in all organisms, both 
prokaryotes and eukaryotes. They move by the process of transposition, a ‘cut 
and paste’ mechanism that depends on breakage and rejoining of the DNA 
strands. 


The P element transposon, found in the fruit fly Drosophila, is the best 
studied of all transposons. The complete P element is 2.9 kilobases in length 
and is usually present in multiple copies, about 30-50 per cell, in so-called 
‘carrier strains’ of Drosophila. The P element sequence is bounded by short 
DNA repeats which are essential to the transposition process, as is the single 
gene carried by the P element, which encodes an enzyme called a transposase. 
The transposase binds to the short repeat sequences at each end of the element 
and catalyses the excision and reintegration of the transposon in another 
location. There are many more classes of transposon for which the 
mechanisms of transposition vary, sometimes resulting in the multiplication of 


Chapter 5 Genes and genomes 


187 


Generating Diversity 


188 


transposon copies within a cell. Some transposons appear to be closely related 
to viruses. 


= Suggest a likely consequence of a transposable element insertion within a 
gene’s ORF. 


© It would probably disrupt expression of the gene, causing a mutation — 
most likely a loss of function mutation (Section 5.6.2). 


Simple sequence repeats 


A considerable proportion of a typical eukaryotic genome comprises 
thousands of short repeat sequences. These are often clustered around 
structural features of chromosomes such as the centromere, the region where 
sister chromatids attach to the spindle fibres during cell division 

(Activity 4.1). Such short sequence repeats are often referred to as satellite 
DNA, a name which derived from experiments in which it was found that 
satellite DNA could be separated from the non-repetitive DNA on the basis of 
its density by the technique of centrifugation. The term satellite DNA is often 
used interchangeably with the term heterochromatin, which describes sections 
of the genome that remain highly condensed during interphase, and which 
generally contain few genes. 


5.8.3 Eukaryotic chromosome replication and telomeres 


Earlier in this chapter, the mechanisms of DNA replication were discussed. In 
the case of circular DNA molecules, such as the typical prokaryotic genome, 
divergent replication forks originating from a single point of origin will 
eventually converge at a point approximately opposite the origin, leading to 
the complete replication of the entire circular molecule. In contrast, a typical 
eukaryotic genome consists of linear DNA molecules, which presents a 
problem in completing DNA replication. Although the leading strand can be 
copied right up to the end of the chromosome (Section 5.3), the situation 
differs for the lagging strand. There is a short stretch of DNA at the extreme 
3' end of the lagging strand, left by degradation of the final RNA primer, 
which cannot be filled in by DNA polymerase because there is no free 3’ -OH 
available to initiate DNA synthesis (Figure 5.28a). Thus, during each round of 
DNA replication, the length of each chromosome decreases very slightly. This 
shortening over many cell divisions would eventually cause problems if 
coding regions became truncated. 


The problem is overcome by the existence of a protective region of multiple 
short nucleotide repeats known as a telomere, which is present at the ends of 
each chromosome (Figure 5.28b). In human chromosomes, the short repeat 
sequence (TTAGG),, is repeated many thousands of times in the telomere. 


= Asa short segment at the end of each lagging strand cannot be replicated, 
what happens to the telomere regions during repeated cell divisions? 


“© The telomeres themselves gradually get shorter. 


The gradual shortening of telomeres over time in dividing somatic cells has 
been suggested as one of many theories for why organisms age. In cell 


5! 
==> —— leading strand 


— —— —_ lagging strand 3 


3° 
name parental DNA 
removal of oe i 
| RNA primers newly synthesised DNA 
l@® RNA primer 


5! 


3' 


DNA polymerase 
fills in gaps with 
DNA 


5! 


(a) (b) 


Figure 5.28 Telomeres protect the ends of DNA. (a) Because DNA synthesis 
requires priming, lagging-strand synthesis omits a few nucleotides at the termini of 
linear eukaryotic chromosomes. (b) Human chromosomes stained to show DNA 
(blue) and the repetitive DNA found at telomeres (pink). 


culture, eukaryotic cells only divide for a certain number of generations (30. 
50) before they enter a state of cellular senescence in which the cells remain 
alive but are unable to proliferate any further. This seems to be related to the 
gradual reduction in the length of their telomeres. In certain types of 
eukaryotic cells, the problem of telomere shortening is overcome by a special 
enzyme known as telomerase, which adds additional nucleotide repeats to the 
end of the chromosomes, thereby constantly restoring telomere length. 
Telomerase is very active in stem cells, a class of ‘immortal’ cells in 
multicellular organisms. Stem cells are able to divide continuously to provide 
a source of the cells that differentiate into specialised somatic cell types 
(Book 3, Chapter 1). 


= In which other types of cells might you expect telomerase to be active? 


In germ-line cells, which in eukaryotic organisms proliferate throughout 
life to form the gametes, which carry the hereditary material into the next 
generation. 


Chapter 5 Genes and genomes 


189 


Generating Diversity 


190 


Telomerase genes also become activated in many cancer cells, which may as a 
result become immortal and continue to divide inappropriately, causing 
tumours. The majority of somatic cells, on the other hand, have no telomerase 
activity. While this may make them vulnerable to ageing, it may also 
minimise the development of cancers, by preventing uncontrolled cell 
proliferation. 


5.9 Viral genomes 


Viruses are, strictly speaking, not organisms since they depend absolutely on a 
host cell to provide most of the biochemical functions they require in order to 
proliferate. This is the only unifying feature of viruses: as a group, they vary 
widely in genome structure and size, and in the details of their propagation 
cycles. 


While all prokaryotic and eukaryotic organisms use double-stranded DNA as 
their genetic material, viruses have a much wider diversity of genome 
structure. Viral genomes may be composed of single- or double-stranded RNA 
or DNA in one of several structural forms. Most viral genomes encode ‘coat’ 
proteins which enclose the genome in a protective capsid when they escape 
from the host cell to infect further cells. Some viral genomes also encode 
specific biochemical functions required for their propagation. For example, 
retroviruses make a DNA copy of their RNA genome after infecting a host 
cell in a process called reverse transcription. The DNA copy of the retroviral 
genome then inserts into the host cell genome where it is later transcribed to 
produce new viral RNA genomes. The virally encoded enzyme responsible for 
the synthesis of DNA from an RNA template, reverse transcriptase, has 
become an important part of the laboratory toolkit of enzymes used in 
manipulating nucleic acids in vitro. 


Some viruses, such as the influenza virus, have genomes that are in several 
separate segments and, if an organism is infected by more than one strain of 
the virus, re-assortment between the genome segments can lead to the sudden 
appearance of new and virulent strains. This has been the case for several 
highly virulent epidemics of human fiu viruses that contained genes from 
viruses that normally infect other organisms, such as swine or avian flu virus. 
You will read more about the interaction of viruses and host cells, and how 
this causes disease, in Book 3, Chapter 3. 


Summary of Sections 5.8 and 5.9 


¢ In general, the genes and genomes of eukaryotes are larger and more 
complex than those of prokaryotes. 


« Eukaryotic genes have a greater range of regulatory sequence elements (to 
which proteins bind and regulate transcription), and their ORFs usually 
consist of several short coding sections (exons) interrupted by non-coding 
regions (introns) which must be removed by RNA splicing to produce a 
mature mRNA for translation. 


¢ Eukaryotic genomes are composed of multiple linear chromosomes, which 
have multiple replication origins and have special structures, telomeres, at 
their termini to protect them against gene loss due to incomplete 
replication of the lagging strand. 


« The genomes of eukaryotes contain large amounts of DNA with little 
apparent function, including repetitive sequences and transposable elements 
(transposons). 


¢ Viruses are not regarded as organisms, and depend on the host cell for 
most of the biochemical functions they require to proliferate. Viruses have 
a wide diversity of genome structures including RNA as well as DNA 
genomes. 


5.10 Genome sequencing and genomics 


For many years, researchers were only able to estimate the numbers of genes 
in more complex genomes by extrapolating from the simpler, better 
characterised systems, which led to estimates that erred very much on the high 
side. Advances in laboratory techniques for DNA cloning and DNA 
sequencing have since enabled scientists to determine the complete genome 
sequences of a growing number of species. As more genomes are sequenced, 
they add to a rich set of data that increasingly informs the biological 
understanding of evolutionary relationships between taxonomic groups, based 
on similarities between gene sequences and the patterns of genome 
rearrangements that have occurred through time. 


5.10.1 A history of genome sequencing 


The early efforts to sequence genomes were limited by the technology of the 
time to small genomes, such as bacteriophage and mitochondria. By the 
1990s, instruments that automated the collection of sequence data had 
significantly speeded up the process. The first organism to have its genome 
fully sequenced was the bacterium Haemophilus influenzae in 1995, and the 
first eukaryote was Saccharomyces cerevisiae (baker's yeast) in 1996. 
Sequencing of the human genome began in 1990; a draft sequence was 
available by 2000, and the ‘complete’ sequence by 2003. Continued technical 
development has significantly increased the speed and reduced the expense of 
genome sequencing (Box 5.3). 


Chapter 5 Genes and genomes 


Generating Diversity 


192 


Smaller fragments move more easily through the gel matrix (which acts 
rather like a sieve), so they move further away from the origin towards 
the other end of the gel, while larger DNA fragments move less far. By 
choosing the gel composition appropriately, gel electrophoresis can easily 
discriminate between fragments that differ in length by as little as a 
single nucleotide. 


The detailed method and technical implementation of DNA sequencing is 
complicated, and subject to continual advances in instrumentation. For 
the purposes of this chapter, the details are not as important as the 
principle. In order to appreciate how the technique works, you need to 
recall some of the features of DNA structure and DNA replication. 
Earlier in this chapter, you learnt how DNA is replicated by successive 
linkage of nucleotides to a growing strand via complementary base 
pairing to a single-stranded template DNA molecule. Crucially, the 
synthesis of a new DNA chain can be terminated if a defective 
nucleotide is incorporated into the growing DNA chain. 


The principle of the most commonly used method called chain 
termination sequencing is to carry out DNA replication in vitro, but to 
‘spike’ the reaction mix with a small amount of ‘defective’ forms of 
dNTPs called ddNTPs (dideoxynucleoside triphosphates) or more 
specifically: ddATP, ddTTP, ddGTP or ddCTP). The ddNTPs are DNA 
chain terminators — when they are incorporated into the DNA chain no 
further nucleotides can be added because they lack a 3’—-OH group 
required for the formation of a covalent phosphodiester bond with 
another nucleotide. 


For chain termination sequencing, four in vitro replication reaction 
mixtures are prepared, each containing the template DNA, a short DNA 
primer, a mixture of the four dNTPs, DNA polymerase and a small 
amount of one of the ddNTPs, e.g. dideoxyadenosine triphosphate 
(ddATP). Whenever the elongating DNA chain incorporates a ddATP 
(instead of a dATP), synthesis terminates. By ensuring that the 
concentration of ddATP in the reaction is very low relative to the 
concentration of dTTP, dATP, dCTP and dGTP, a range of DNA 
fragment lengths will be synthesised, all of them terminating with the 
incorporation of a ddATP. In other words, each of the newly synthesised 
fragments will terminate at one of the points where there is a T in the 
DNA template sequence (Figure 5.29a). Three similar reactions are set 
up containing ddTTP, ddCTP, or ddGTP, which will thus provide a range 
of fragments terminating where A, G and C, respectively, occur in the 
template sequence. The reactions terminating with incorporation of 
ddATP, ddGTP, ddTTP and ddCTP can be referred to as the A, G, T and 
C reactions, respectively (Figure 5.29b). 


pe eae template strand of DNA 
5'— ACATAATTGGGACTAT—3 to be sequenced 
A-5' 
‘ATA—5' in the ‘A’ reaction 


DNA polymerase 
synthesises a mixture 
of fragments terminating 


in ddATP 


ACCCTGATA—5’ 
‘AACCCTGATA—5' 
ATTAACCCTGATA—5' 

(a) 


separation of 
fragments by gel 


electrophoresis 
larger Gar ce 
fragments . bd uf e 
- i A 
=, A T 
= ui so sequence of i 
A Ofiginaltemplate = 
r 4 A __ Strand is: 7: 
CC fe} 
Bc é 
el T A 
= G Cc 
pe Se, A 

smaller 

fragments ¥ = . a 


(b) 


Figure 5.29 Chain termination sequencing. (a) The principle by which the 
‘A’ reaction generates a mixture of fragments ending in ddATP, representing 
each place in the template DNA where there was a T. (b) Four DNA 
synthesis reactions containing a mixture of DNA fragments terminating in 
ddGTP, ddATP, ddTTP or ddCTP are separated by gel electrophoresis in 
separate lanes as indicated. Smaller fragments move further down the gel. 
The sequence of the synthesised strand can thus be read from the bottom to 
the top of the gel. 


In order to ‘label’ the fragments and make them visible after gel 
electrophoresis, a small amount of dNTPs labelled with radioactivity can 
also be added to each of the four reactions to be incorporated into the 
newly synthesised DNA fragments. 


The reaction products are then separated by gel electrophoresis, as shown 
in Figure 5.29b and the separation pattern of the newly synthesised DNA 
fragments is revealed by exposing the gel to X-ray film, which darkens 
where it is exposed to radioactivity. The DNA sequence can be ‘read’ 
from the order of the bands on the gel. 


DNA sequencing technology has continually improved to enable higher 
throughput. Nowadays, automated sequencing uses ddNTP terminators 
labelled with different coloured fluorophores, so that all four reactions 


Chapter 5 Genes and genomes 


193 


Generating Diversity 


194 


The chain termination method of DNA sequencing is limited by the size- 
resolving capacity of gels and can elucidate only a few hundred base pairs of 
sequence in each gel run. This is very small in comparison to the size of even 
a prokaryotic genome, so a great many sequencing gels are required to 
determine a complete genome. Most genome sequencing projects take a 
‘shotgun’ approach, breaking up genomic DNA into random small fragments, 
which are used to determine a large collection of short DNA sequences, each 
of which overlaps with some other sequences. The sequences are then 
analysed, sorted and reconstructed by computer, using software that compares 
each short segment of sequence against all the others, and generates larger 
contiguous sequences based on overlapping short sequences. Typically, each 
base pair of a finished genome sequence will have been sequenced many 
times, being represented in numerous overlapping sequences. Once the 
genome of a particular species has been sequenced, this sequence can be used 
as a ‘scaffold’ to much more quickly assemble genome sequences of other 
individuals of the same species. This has been an important factor in 
developing high-speed sequencing of human genomes. 


The rapid advances in sequencing technology have made genome sequencing 
both quicker and less expensive. Not long before the start of the Human 
Genome Project in 1990, typical sequencing projects cost approximately $1 
per base pair of ‘finished’ genome sequence (thus a genome of 3 x 10” bp 
would have cost around 3 billion US dollars). At the time of writing 

(early 2012), an individual’s DNA sequence could be determined for only a 
few thousand dollars, and the future likelihood of affordable routine genome 
sequencing for everyone may herald a new era of ‘personalised medicine’ in 
which genetics could be used to predict susceptibility to disease and response 
to treatments. The ethical debate surrounding the ability to predict an 
individual’s heritable characteristics is not something there is space to explore 
here, but it is likely to have profound effects on the use of such technology. 


From a genetic perspective, one of the major outcomes of genome sequencing, 
particularly of the human genome, has been the observation of widespread 
genetic variation not only between species, but also between individuals of the 
same species. 


5.11 Variation and evolution of genomes 


As the genomic DNA sequences of an increasing number of species have been 
determined, comparisons of gene sequences within and between genomes has 
helped to reveal the functions and origins of many genes and has provided 
some of the strongest evidence supporting phylogenetic relationships during 
evolution (Section 1.2.2). Genes in different species that share a degree of 
sequence similarity due to their common evolutionary origins are known as 


Chapter 5 Genes and genomes 


homologous genes (the term ‘homologous’ is often used more loosely, and 
incorrectly, to mean ‘similar’). If the DNA sequence of a specific human gene 
is compared to its mouse counterpart, although the two genes clearly encode 
very similar proteins, they are not identical. That is, there are DNA sequence 
differences between them. How do genomes change in sequence during the 
evolutionary history of a species? 


DNA replication and repair mechanisms are remarkably accurate but errors 
occur, albeit at a low rate in most organisms. 


= What is a typical overall error rate during mammalian DNA replication? 


© Typically one error per 10° bases copied (Section 5.3). 


The 1000 Genomes Project (launched in 2008) is an international project 
which set out to determine the DNA sequence of at least 1000 human 
genomes (and indeed set its sights at a higher final total), and thereby identify 
the genome locations where there is genetic variation in human populations. 
Much of the genetic variation within H. sapiens takes the form of single 
nucleotide polymorphisms (SNPs): sites in the human genome where there 
are single nucleotide differences between individuals. 


= What are some of the processes that can result in variation at a particular 
locus in the genome among individuals. 


4 Variation can result from misincorporation of bases during replication, 
nucleotide insertions or deletions and base changes due to chemical 
modification (e.g. deamination, Section 5.6.1). 


The majority of SNPs, though by no means all, are single nucleotide changes 
which have little or no effect on gene activity. 


The second major type of variation is gene copy number variation (CNV). 
Some genes may be present in a variable number of copies in different 
individuals of the same species, resulting in different levels of expression of 
the gene product. CNV is a result of the duplication of existing genes, or 
larger segments of the genome, over time. Gene duplication can also result in 
the evolution of gene families, structurally related genes which may have 
similar or different functions. 


5.11.1 Gene duplication and the origin of gene families 


There are a number of mechanisms by which segments of DNA may be 
duplicated, including errors during DNA replication, unequal crossing over 
between two homologous chromosomes during meiosis (Section 4.3.1 and 
Activity 4.1), or the action of transposons (Section 5.8.2). Transposons may 
carry a section of flanking genomic DNA with them when they excise from 
the genome and insert in another location, in some cases effectively 
duplicating some of the host’s genes. Over long periods of time, a single 
ancestral gene may therefore undergo a number of different duplication events. 
Occasional duplication of a whole chromosome or indeed of the whole 
genome (see below) brings about the duplication of many genes 
simultaneously. 


195 


Generating Diversity 


Figure 5.30 Schematic 
representation of the four-chain 
vertebrate haemoglobin (with 
two @ and two B chains). Each 
of the red disks represents a 
haem group — the iron- 
containing component of 
haemoglobin, which carries 
oxygen. 


196 


Once a gene encoding a particular product is duplicated, redundancy is 
created. In other words, as long as one copy of the gene continues to operate 
normally and produce its essential gene product, the other gene copy is 
surplus to requirements and is freed from selection pressure. One of the newly 
duplicated genes may therefore change in a way that prevents it from 
producing its original product without any deleterious consequences for the 
fitness of the organism. Mutation and natural selection can thus result in 
divergence of the sequence of the two genes during subsequent generations: 
each gene can follow its own evolutionary path. In many cases, one of the 
duplicated genes simply acquires one or more mutations that prevent 
production of, or abolish the function of, its product. Such a permanently 
inactivated gene is known as a pseudogene. One of the duplicates may, 
however, acquire a new function. A gene family is a collection of structurally 
related genes (some may be functional, some may be pseudogenes) that have 
appeared during evolution by means of repeated duplication and divergence 
from a single ancestral sequence. 


An example of a gene family is that of the human globin genes. Globins are 
proteins that play an important role in oxygen transport in all vertebrates, and 
in many invertebrates. The simplest form of globin, found in certain marine 
worms, a few insects and some fish, is a monomeric (single polypeptide) 
protein encoded by a single gene. Human haemoglobin, the major component 
of red blood cells, is in contrast a tetrameric (four polypeptide) unit composed 
of two a (alpha) globin and two f (beta) globin polypeptides (Figure 5.30). 
Each type of a and B chain is encoded by a corresponding a and f gene that 
arose by duplication followed by mutation of a single ancestral globin gene. 
During evolution, both a and B genes have in fact undergone several distinct 
duplication events. As a result, in humans there are a number of a genes 
arranged in a cluster on chromosome 16, while a number of B genes are 
clustered separately on chromosome 11. Both groups include non-functional 
pseudogenes, which are denoted by yw (the Greek letter ‘psi’) within their 
names. The closely related gene myoglobin, lies on chromosome 22. 
Myoglobin has similar structure and properties to haemoglobin, but is found 
in muscles and is monomeric. 


Figure 5.31 shows the evolutionary history of the oxygen-carrying globin 
genes in humans, deduced from sequence similarity in different species, 
starting with the single-chain globin gene at the top and finishing at the 
bottom with the a- and B-globin gene clusters and myoglobin found in 
current-day primates. The approximate dates at which each gene duplication 
event is thought to have occurred are shown at each branch point in the 
diagram, and are correlated with major events in animal evolution. First, the 
single ancestral gene was duplicated. One copy underwent no further 
duplication, and became the myoglobin gene. The other copy of the globin 
gene underwent further duplications later in evolution, yielding the genes of 
the a- and B-globin families, which were favoured by natural selection and so 
were retained. 


Chapter 5 Genes and genomes 


= 


chromosome chromosome Seo reaie 
22 16 


Figure 5.31 The ancestry of the human globin genes deduced from sequence 
comparisons. The timescale on the diagram runs from top to bottom, i.e. more 
ancient gene duplication events lie towards the top, and more recent events 
towards the bottom. The appearances in the fossil record of new animal taxa are 
indicated as horizontal blue bars, but these do not correspond specifically with the 
globin gene duplication/divergence events. 


5.11.2 Chromosome and genome duplications 


Duplication events are not confined to single genes. When chromosomes of 
closely related species are compared, large chromosome rearrangements are 
frequently observed. Chromosome rearrangements include duplications (where 
a section of chromosome has been duplicated), deletions (where a section of 
chromosome has been lost) and inversions (where a section of chromosome 
has been flipped in orientation relative to the ‘normal’ chromosome order). 


The duplication of sections of genomes containing many genes (known 
segmental duplication) is quite a common event. Segmental duplications may 
be located on the same chromosome or on different chromosomes. In some 
cases, the segments of duplicated DNA are very large, sometimes several 
hundreds of thousands of base pairs in length. One of the more surprising 
findings from human genome sequencing projects is that almost 5% of the 
human genome is made up of this type of duplication. Individuals differ 
widely in the detail and extent of individual duplications; in some cases, many 


197 


Generating Diversity 


198 


copies of a segment are present. We currently know very little about how or 
why these segmental duplications have arisen, but they have probably played 
a significant role in the creation of new human genes and gene families during 
evolution. 


Whole genome duplication and even triplication are relatively frequent events 
in the evolution of plant species. Notably, modern wheat varieties are known 
to be hexaploid derivatives of their ancestral forms; that is, the genome has 
triplicated from the initial diploid state to give six copies of the haploid 
genome. While most common in plants, cases of apparent genome duplication 
in animals are known. For example, a whole genome duplication appears to 
have occurred during the evolution of bony fish, and more recently in the 
genome of the African clawed frog Xenopus laevis, which is double the size 
of that of other living species in the genus, implying that a genome doubling 
event occurred at some time after the appearance of the genus Xenopus. 


5.11.3 Genome-wide association studies (GWAS) 


Conventional genetics experiments are impossible to perform on humans, for 
obvious ethical and practical reasons, but the identification of large numbers 
of SNPs, CNVs and other genetic variants as a result of genome sequencing 
has brought with it a new era of human genetics. 


Many millions of SNPs have been identified by comparing individual human 
genome sequences. Genome-wide association studies (GWAS) make use of 
SNPs to look for an association between genetic variation and certain traits, 
such as the risk of developing a disease. SNPs only very occasionally impact 
directly on gene expression, the great majority of SNPS have no discernible 
phenotypic impact. However, while the majority of SNPs are not directly 
associated with causing a disease or susceptibility to a disease, those that lie 
very close to (i.e. are linked to) disease-related genes can be used as ‘markers’ 
in genetic linkage studies to identify whether a disease-related allele is 
present. SNPs can thus be thought of as silent mutations that provide a way of 
‘mapping’ an individual’s SNP genotype, i.e. the particular combination of 
SNPs in their genome, and whether it correlates with a particular 
characteristic, such as disease susceptibility. 


Recall from Section 4.4.3 that the recombination frequency between two loci 
is related to the physical distance between them. Therefore genes and nearby 
SNPs that are very close together on the chromosome will tend not to be 
separated by recombination. A disease-related gene allele can therefore often 
be followed through the generations just by detecting a particular combination 
of SNP variants that remain linked to it. Such a combination of SNPs is 
known as a haplotype. Just identifying a few SNP variants in a stretch of 
several hundred thousand bases can be enough to confirm that a particular 
gene allele is present, which is much more efficient than sequencing the whole 
DNA region in every individual. Typically, a GWAS uses high-throughput 
analytical techniques to allow researchers to sample several hundred thousand 
known SNP locations in each individual subject, and then rapidly analyse the 
pattern of variation across the whole genome. These studies can identify the 
genetic contribution to disease by comparing the DNA of two groups of 


Chapter 5 Genes and genomes 


participants: people with the disease and a similar group of people without the 
disease. The SNP genotype of the DNA from each individual is determined 
and analysed to reveal any association between a particular SNP haplotype 
and the group that have the disease (Figure 5.32). 
14> 
12h 


10}- 


log P value 
cememe emcee 


1 2 3 4 5 6 Far OS) 10: ofl 12 13. 44 96 46 17 1819202122 


Figure 5.32 A genome-wide association study of single nucleotide polymorphisms associated with myocardial 
infarction (commonly known as a heart attack). This type of plot is referred to as a Manhattan plot. The x-axis refers 
to several thousand locations along the human genome (by chromosome, | to 22) where there are known to be SNPs, 
each represented by a dot. The y-axis refers to probability (the P value) that the presence of a particular SNP is 
associated with the presence of the disease. You can see that variation in a particular SNP located on chromosome 9 
is associated with a high probability of the disease. 


To date, these studies have identified both genetic risk and protective genetic 
factors for asthma, cancer, diabetes, heart disease, mental illness, and several 
other human diseases. GWAS therefore offers the opportunity to take a 
combined molecular and genetic approach to identifying the gene variants that 
contribute to many medical conditions. 


Summary of Sections 5.10 and 5.11 


¢ Comparison of the genomic DNA of different species has helped to reveal 
the function and evolutionary origin of many genes, and also the enormous 
genetic variation between individuals of a species. 

¢ The two main types of genomic variation are single nucleotide 
polymorphisms (SNPs) and gene copy number variation (CNV). 

¢ During evolution, gene copy numbers may change via duplication of 
genomes, segments of genomes or individual genes. This is how gene 
families have arisen. 

¢ Genome-wide association studies (GWAS) can help to identify linkage 
between genome variation and complex traits such as disease 
susceptibility. 


199 


Generating Diversity 


200 


5.12 Final word 


This chapter has described the structure and function of prokaryotic and 
eukaryotic genomes, and how the genetic information within the genome is 
accurately copied and passed on to daughter cells. The genomes of 
prokaryotic organisms are relatively compact and small, enabling these 
organisms to respond rapidly and efficiently to environmental challenges. In 
contrast, most eukaryotic genomes are larger and more complex. Despite these 
differences, eukaryotes and prokaryotes share many systems by which 
replication errors and damage to DNA may be corrected or repaired. 
Nevertheless, occasional sequence changes are inherited and give rise to 
mutations. Advances in genome analysis techniques, summarised in this 
chapter, have begun to unravel the properties and evolutionary history of 
complex eukaryotic genomes. 


The next chapter looks in more detail at the control of gene expression, and 
the essential role played by some non-coding DNA sequences. 


5.13 Learning outcomes 


5.1 Explain how the structure of DNA relates to the mechanisms of DNA 
replication. 


5.2 Describe the roles of the major components of the replication complex in 
DNA replication. 


5.3 Describe the mechanisms by which the accuracy of DNA replication is 
ensured, and by which damaged DNA can be repaired. 


5.4 Use the genetic code to determine the sequence of a gene product. 


5.5 Explain the effects of different classes of mutation on the expression or 
function of a gene product. 


5.6 Compare the general features of gene structure and genome composition 
in prokaryotes and eukaryotes. 


5.7 Outline some of the techniques used to study gene sequence and function. 


Chapter 6 The control of gene expression 


Chapter 6 The control of gene 
expression 


6.1 Introduction 


This chapter considers gene expression, the process by which the information 
in the genome of an organism is used to synthesise gene products, which are 
either proteins or functional RNAs. The gene encodes the linear sequence of a 
gene product; but in addition, the DNA flanking the coding region contains 
much more information in the form of regulatory DNA sequences that specify 
when, and in what type of cell, the gene product is expressed. Precise 
regulation of gene expression is crucial. If a protein is produced in a cell at 
the wrong level or at the wrong time, the effect may be devastating. 


You should bear in mind an important difference between the role of gene 
regulation in single-celled prokaryotes and in multicellular eukaryotes. Gene 
regulation allows bacterial cells to adjust rapidly to changes in their 
environment, and to optimise their growth and proliferation. Many eukaryotic 
genes are also regulated in response to external environmental changes, but 
additionally gene regulation in multicellular organisms drives the development 
of the different types of specialised cells. All of the cells in a multicellular 
organism originate from a single cell (the zygote) and therefore all carry the 
same genetic information, but during development of the organism different 
patterns of gene expression result in different types of cell with unique 
properties. 


= What is the alternative name for this process of cell specialisation? 


© Differentiation. 


So, how is gene expression controlled? In order to answer this question, it is 
first necessary to examine in more detail the processes by which the gene 
products (i.e. RNAs and proteins) are synthesised using the information 
encoded in DNA. The first part of this chapter will recap the processes by 
which proteins are synthesised, and the remainder of the chapter will look in 
more detail at the mechanisms of gene expression and how it is regulated in 
prokaryotes and eukaryotes. 


6.2 An overview of gene expression 


You will recall from Chapter 4 that a gene is a heritable unit that contributes 
to the physical characteristics (the phenotype) of an organism. It consists of a 
segment of genomic DNA that specifies the structure of a polypeptide or a 
functional RNA (e.g. a ribosomal RNA or a transfer RNA). In Activity 5.1 
Parts 1 and 2 you observed how complementary base pairing involving 
hydrogen bonding between nucleotide bases in the two strands of the DNA 
double helix plays an essential role in: (1) the stability of the DNA molecule, 
(2) accurate replication to produce two identical DNA double helices 
(Section 5.3), and (3) the process of gene transcription. 


201 


Generating Diversity 


202 


Figure 6.1 shows the two main stages in the information flow from DNA to 
protein. In the first step, transcription, a segment of sequence on only one 
strand of genomic DNA (the template strand) is transcribed to produce an 
intermediary RNA molecule called a messenger RNA (mRNA). 
Complementary base pairing between the DNA template strand and 
ribonucleotides produces an mRNA that is complementary to the DNA 
template strand. In the second step, translation, the sequence of the mRNA is 
translated into the amino acid code, to produce a polypeptide. 


= What unit along the mRNA strand encodes a single amino acid in the 
polypeptide chain? 


© Each ‘triplet’ of three bases in the mRNA sequence, known as a codon, 
specifies one amino acid in the polypeptide chain. 


codon codon codon codon codon codon DNA 
- 1 tea ae a = 3' non-template strand 
GCT GTT GGA AAG 
‘A CGA CAA CCT TTC 


Seirenrenvmirattaits 5! template strand 


] transcription 


er ei Leet oi a codon messenger RNA 


STS TT ern ® (complementary tothe 
“OM GCL fet DNA template strand) 


] translation 


amino amino amino amino amino amino 
acid1 acid2 acid3 acid4 acid5 acid6 


Figure 6.1 The relationship between the codons in DNA and mRNA and the 
amino acids in the encoded polypeptide. The sequence of the DNA template strand 
is transcribed into mRNA (note that RNA has the base uracil (U) instead of 
thymine (T)) and the mRNA sequence is translated into the polypeptide sequence. 
Each triplet of bases (a codon) specifies a particular amino acid. 


Note that the flow of information occurs in only one direction: from DNA 
sequence, via RNA, to protein sequence, and never backwards from protein to 
nucleic acid (DNA or RNA). This one way flow of information; DNA makes 
RNA makes protein, is often called the central dogma of molecular biology, 
first proposed by Francis Crick in 1958. 


Chapter 6 The control of gene expression 


6.2.1 The first stage of gene expression: transcription of 
DNA into RNA 


Like DNA, RNA is a linear polymer composed of four different types of 
nucleotides linked together by covalent phosphodiester bonds (Figure 6.2a and 
Section 5.2.1). However, while DNA is double-stranded, RNA is usually - 
single-stranded (although it is able to form internal base pairs between bases 
in different parts of the single strand). The nucleotides in RNA are 
ribonucleotides containing the 5-carbon sugar ribose (DNA contains 
deoxyribose). RNA also differs from DNA in that it contains the base uracil 
(U) instead of thymine (T); however, since U also base-pairs with A, a DNA 
strand can act as a template for RNA strand synthesis by complementary base 
pairing. 


5 
phosphodiester bond 
non-template strand 
5’ (sense strand) 
3 CTCAGGA 
RNA transcript . 
CUCAGGA 
I ll I NM Wt 
5° AGTCCT 
3° template DNA strand 
(antisense strand) 


O phosphate 
sugar 


[_ |base 


Figure 6.2 DNA and RNA strands in transcription. (a) RNA is synthesised from 
ribonucleotide precursors with three linked phosphate groups. As each 
ribonucleotide is added to the 3’ end of the growing RNA strand, two phosphates 
are removed from the ribonucleotide, releasing energy for phosphodiester bond 
formation. (b) It is necessary to discriminate between the two DNA strands 
because only one acts as the template for transcription of a gene. Except for the 
substitution of U for T, the RNA transcript has the same sequence of bases as the 
complementary DNA strand (which is therefore sometimes called the sense strand, 
while the template DNA strand is known as the antisense strand). RNA is always 
synthesised in a 5’ to 3’ direction, reading the template DNA strand in the 3’ to 
5' direction. 


Whilst the deoxyribonucleotides 


composing DNA contain the 
5-carbon deoxyribose sugar 
(Figure 5.2a), ribonucleotides 


contain ribose sugar which has a 
second hydroxyl group on the 2° 
carbon, but is otherwise identical 


to deoxyribose. 


203 


Generating Diversity 


Half-life is the period of time it 
takes for a substance undergoing 
decay to decrease in amount by 
one-half. 


204 


During transcription, the two DNA strands unwind in the region that is being 
transcribed (Figure 6.2b). Transcription is different from DNA replication 
because only one of the DNA strands, the template strand, is ‘read’ to 
produce the single-stranded RNA copy. The other DNA strand is known as 
the non-template strand (or the sense strand, because its sequence corresponds 
to that of the transcribed RNA while the template DNA strand is also known 
as the antisense strand, Figure 6.2b). 


The synthesis of RNA molecules in cells is carried out by enzymes called 
DNA-dependent RNA polymerases. Throughout this chapter, the shortened 
name RNA polymerase will be used for these enzymes. RNA polymerase 
binds to the DNA at the start of the gene and separates the two strands by 
breaking the hydrogen bonds between complementary nucleotides. Unlike 
DNA polymerase, RNA polymerase is capable of initiating polymerisation of 
a new RNA chain; it doesn’t require a primer to provide a 3’—-OH. RNA is 
synthesised from precursors known as ribonucleoside triphosphates (ATP, 
GTP, CTP and UTP). The polymerase moves along, ‘reading’ the base 
sequence of the template DNA strand in a 3’ to 5’ direction, sequentially 
linking ribonucleotides to the 3’ end of the new RNA chain. Synthesis of the 
RNA transcript therefore always proceeds in a 5’ to 3’ direction (Figure 6.2b). 


= What is the sequence of the RNA transcribed from the following template 
DNA strand 3’ AAGCTCGACTTGACT 5’? 


The sequence of the RNA transcript is complementary to the template 
strand, i.e. 5S’ UUCGAGCUGAACUGA 3’. 


As the RNA polymerase moves along the template strand, the two DNA 
strands behind it rewind into the double helix. How the RNA polymerase 
recognises where to start and where to stop transcribing is addressed later in 
the chapter. 


The different forms of transcribed RNA 


Transcribed RNAs can be divided into two major groups, coding RNAs and 
non-coding RNAs. Coding RNAs are the transcripts of protein-coding genes. 


= What is the alternative name given to a mature protein-coding RNA? 


© Messenger RNA (mRNA). 


Thousands of different genes are being transcribed in a cell at any one time. 
mRNAs are usually very short-lived molecules; the half-life of bacterial 
mRNA after synthesis is no more than a few minutes, and in most eukaryotic 
cells the majority of mRNAs are degraded within a few hours. 


= What is the advantage to a cell of producing short-lived mRNAs? 


© This allows cells to rapidly adjust the rate of protein synthesis in 
response to their changing needs. If all mRNAs were very long-lived, 
they might continue to be translated when their protein products were no 
longer required. 


mRNA, however, constitutes only about 4% of the total quantity of RNA in a 
eukaryotic cell. In fact, most of the RNA is non-coding RNA (ncRNA) — the 


Chapter 6 The control of gene expression 


product of non-protein-coding genes. These ncRNAs have a range of different 
functions, all of which are performed directly by the RNA molecules 
themselves. The most abundant type of ncRNA is ribosomal RNA (rRNA), 
which makes up over 90% of the total RNA in actively dividing cells. rRNAs 
are important structural components of the ribosome. Another type of ncRNA 
is transfer RNA (tRNA) which carries individual amino acids to the 
ribosome, where they are incorporated into a newly translated polypeptide 
chain (Activity 5.1, Part 2). Both of these types of RNA are produced by 
transcription in the same way as mRNA, but they are not translated into 
protein. 


Eukaryotes also contain a large variety of other much shorter ncRNAs, some 
of which (for example, the class known as microRNAs) have a role in 
controlling gene expression (Section 6.5.4). 


6.2.2 The second stage of gene expression: translation of 
mRNA into protein 


Ribosomes in the cell cytoplasm bind to the mRNA and use the sequence of 
nucleotide bases in the mRNA sequence to specify the order of amino acids in 
the polypeptide chain (Activity 5.1, Part 2). As you learnt in Chapter 5, the 
relationship between the three-base codons in RNA and the amino acids they 
specify is known as the genetic code (Figure 5.13). There are 64 possible 
triplet codons, but only 20 amino acids occur naturally in proteins, so the 
excess of codon combinations means that there are actually several codons for 
most of the amino acids. In addition there is a ‘start? codon (AUG), which 
also encodes methionine, and there are three ‘stop’ codons (UAA, UAG and 
UGA) that signal the termination of translation. 


= Using the genetic code table in Figure 5.13, what is the sequence of the 
polypeptide translated from the following mRNA strand: 
5' AUGGCCGGACAUGCUUCGCGG 3’? 


The polypeptide is: Met Ala Gly His Ala Ser Arg (or MA GH AS R). 


Recognition of mRNA codons by tRNA 


The link that enables conversion of the four-letter code of DNA and mRNA 
into the 20-letter code of proteins is transfer RNA (tRNA). tRNAs are single- 
stranded RNA molecules that fold up in a particular conformation (three- 
dimensional shape) due to internal complementary base pairing between 
nucleotides in different parts of the single tRNA strand (Figure 6.3a). The 
conformation of the tRNA is such that a sequence at the 3’ end of the tRNA 
is exposed, and it is here that the amino acid attaches. At another site on the 
tRNA is a region where there are three exposed, unpaired bases; these 
constitute the anticodon. 


= How is the tRNA involved in translation of an mRNA sequence? 


© The anticodon of the tRNA can interact with a complementary codon in 
mRNA by complementary base pairing (Activity 5.1). The amino acid 
attached to the tRNA is thus brought into position to be incorporated into 
a polypeptide chain. 


205 


Generating Diversity 


206 


anticodon 


en 


UAC 


3" 
amino acid 5° (b) 
binding site 
(a) 
BACTERIA 
wobble codon | possible anticodon 
base (mRNA) | bases (tRNA) 
U A,Gorl 
Cc Gorl 
A Uorl 
G CorU 
EUKARYOTES 
wobble codon | possible anticodon 
base (mRNA) bases (tRNA) 
U Gorl 
Cc Gorl 
A U 
(c) G Cc 


Figure 6.3 Transfer RNA (tRNA). (a) The generalised structure of a tRNA 
molecule showing the three-dimensional shape conferred by complementary base 
pairing within the single strand. (b) A schematic representation of a tRNA carrying 
methionine. (c) The wobble base pairs that commonly form between the third (3’) 
base in an mRNA codon and the 5’ base of the tRNA anticodon in bacteria and 
eukaryotes. A ‘non-Watson-Crick’ pairing between uracil (U) and inosine (I) is 
represented here as a single hydrogen bond (rather than the two or three hydrogen 
bonds of Watson-Crick base pairs). 


A tRNA with a particular anticodon sequence always carries the same amino 
acid. For example, 5’ AUG 3’ is the translation ‘start’ codon, but it is also the 
codon for the amino acid methionine (Met), and is recognised by a tRNA with 
the anticodon 5’ CAU 3’ (Figure 6.3b). Some amino acids are encoded by 
several different codons. In fact, only methionine (AUG, which is also the 
start codon) and tryptophan (UGG) each have a single type of codon. 


= Look at the genetic code in Figure 5.13 and identify all the codons for 
valine (Val). How do the codons compare? 


Chapter 6 The control of gene expression 


© There are four codons for valine: GUU, GUC, GUA and GUG; the first 
two bases are identical for all of them, while the third varies. 


Having several codons for some amino acids is referred to as degeneracy or 
redundancy of the genetic code. Degeneracy of the genetic code does not 
imply any ambiguity; each type of tRNA molecule can be attached to only 
one type of amino acid. The four codons that specify valine (Val), for 
example, never carry any other type of amino acid. 


If there were a one-to-one correspondence between tRNA molecules and 
codons, all cells would require 64 different types of tRNA molecules (61 
carrying amino acids and three that recognise stop codons). However, most 
organisms actually have only around 40-50 different types of tRNAs, because 
some anticodons can pair with more than one codon due to a phenomenon 
known as wobble base pairing. This occurs because the 5’ base of the 
anticodon (which base-pairs with the 3’ base of the mRNA codon) does not 
have such strict base pairing requirements as the other two bases, and can in 
some cases form ‘non-Watson—Crick’ base pairs (i.e, something other than A: 
U and C:G base pairs). Many tRNAs contain unusual or modified forms of 
bases that are particularly prone to wobble. For example, if the base at the 5’ 
wobble position of a tRNA anticodon is inosine (I), the tRNA can recognise 
any one of three different codons in bacteria and either of two codons in 
eukaryotes (Figure 6.3c). Hence fewer than 64 types of tRNA anticodon are 
actually required to recognise all of the possible codons. 


Translation is initiated when a ribosome assembles on the mRNA at the start 
of a protein-coding sequence. The first tRNA to bind to the mRNA at the 
ribosome does so at the start codon (labelled codon 1 in Figure 6.4a) which 
has the sequence AUG (not all AUG codons act as start codons, however, and 
you will learn later on how the correct start codon of a gene is identified). 
Once the first tRNA has bound to the mRNA, a second one binds 

(Figure 6.4a) and the ribosome links the two amino acids, forming what is 
known as a peptide bond (Figure 6.4b). The first tRNA molecule is then 
released (Figure 6.4c) and the mRNA moves through the ribosome to present 
a new codon. tRNAs enter the ribosome one at a time, delivering amino acids 
which are linked to the growing polypeptide chain (Figure 6.4d). This is 
called the elongation phase of translation. Notice that the first amino acid in 
the polypeptide has a free amine group (—NH2) so it is said to form the amino 
terminus or N-terminus of the polypeptide and the last amino acid added to 
the polypeptide chain has a free carboxyl group (COOH) and forms the 
carboxy terminus or C-terminus. Polypeptides are always synthesised in the 
same direction: from the N-terminus towards the C-terminus. 


The final event in polypeptide synthesis is termination of translation, which is 
brought about by one of three specific stop codons in the mRNA sequence 
(UAA, UAG or UGA). When the stop codon is reached, synthesis stops and 
the completed polypeptide dissociates from the mRNA. 


207 


Generating Diversity 


codon; : i ; i 
Bat 
CCU IGCUIGUUIGGAAAG: 
BR 
oe 
incoming tRNA 
(carrying an amino 
oo acid) which will 
ae, bind to codon 2 
‘COOH wr 
; eodon ; codon ; on i ; 
‘re errr © 
! AUG ! CCU |GCUIGUUGGA'AAG 
HU Ul Mt 
UAC GGA 
NH2 COOH 
(ey) Peotidebong ae 
! AUG ! CCU IGCUIGUUIGGA'AAG 
pe WE 
N) GGA 
©) NH2 ‘COOH 
; codon ; codon ;codon ;codon ; codon ;codon ; 
Pee ee er ' 4 1 5 Wee 
' AUG ! CCU IGCUIGUUGGAIAAG! 
S Witt 
of UUC 
ae f 
NH2: ‘COOH 
(d) growing polypeptide chain 


Figure 6.4 Simplified scheme for translation. (a) An mRNA with a tRNA molecule already bound and a second 
about to bind. The ribosome is shown in green. (b) Two tRNA molecules are now bound to the mRNA, and a 
peptide bond has formed between the first two amino acids of the polypeptide chain (Met and Pro). (c) The first 
tRNA molecule is released. (d) A few steps further on in the process — the growing polypeptide chain now consists 
of six amino acids; the sixth amino acid (Lys) corresponds to codon 6. Note that the different components shown are 
not drawn to scale. 


208 


Chapter 6 The control of gene expression 


6.2.3. mRNA and polypeptide processing in prokaryotes and 
eukaryotes 


You are now familiar with the steps by which the linear DNA code of a gene 
is used to synthesise a polypeptide sequence, but there are important 
differences in how these processes occur in prokaryotic and eukaryotic cells. 


Prokaryotes do not enclose their DNA in a nucleus, so transcription and 
translation both occur in the cytoplasm, and ribosomes can attach to a new 
bacterial mRNA and start translating it even while transcription of the mRNA 
is still being completed. Prokaryotic mRNA is thus immediately ready for 
translation with no further processing. 


In contrast, in eukaryotic cells, the DNA is separated from the cytoplasm by a 
double membrane, the nuclear envelope. Transcription occurs in the nucleus 
and the transcribed RNA must then be transported out of the nucleus and into 
the cytoplasm where the ribosomes are located (either ‘free’ or attached to the 
rough ER, Section 3.4.4). Unlike prokaryotic RNA, the primary RNA 
transcript produced in a eukaryotic cell requires extensive post- 
transcriptional modification in the nucleus before it is ready for export to 
the cytoplasm and translation. One of these modifications is RNA splicing 
(Figure 6.5) to remove some sections of the primary transcript. 


= From your understanding of the organisation of the eukaryotic genome 
(Chapter 5), why is post-transcriptional RNA splicing necessary for 
eukaryotic but not prokaryotic mRNA? 


© Unlike prokaryotic genes, the protein-coding sequences of eukaryotic 
genes (exons) are interrupted by introns (non-coding sequences). Introns 
must therefore be removed from the primary RNA transcript to produce a 
mature mRNA that can be accurately translated into the polypeptide. 


Splicing removes the introns and joins together the exons to form the mature 
mRNA (Figure 6.5). The 5’ and 3’ ends of the mature mRNA also undergo 
modifications, known as RNA capping and polyadenylation (pronounced 
‘polly-add-en-elation’), respectively, which will be described in more detail 
later in this chapter. These terminal modifications prepare the mRNA for 
export out of the nucleus through the nuclear pores (Section 3.4.3), and into 
the cytoplasm, where it is finally translated at ribosomes to produce a 
polypeptide. 


Eukaryotes also carry out a wide range of post-translational modifications of 
the translated polypeptide before it becomes a mature functional protein 
(whereas prokaryotes carry out very few post-translational modifications). 
Eukaryotic polypeptides may be modified in many different ways by different 
cellular enzymes, for example, cleavage of the polypeptide into smaller 
sections, or attachment of additional chemical groups (such as phosphate, 
carbohydrate or lipid groups) to particular amino acids. The ‘signals’ for many 
of these modifications are encoded in the amino acid sequence of the 
polypeptide itself. 


209 


Generating Diversity 


210 


Figure 6.5 Stages in gene expression in eukaryotes: a summary of events 
showing the flow of information in a eukaryotic cell from nuclear DNA to a 
functional protein via primary RNA transcript, mature mRNA and newly 
synthesised polypeptide. Note the introns — the lighter bands in the DNA and the 
primary RNA transcript; introns are removed by splicing of the primary RNA 
transcript to form a mature mRNA. 


= What other types of ‘signal’ sequences may be present in a polypeptide? 


© Some polypeptides contain specific signal sequences that ‘target’ them to 
specific parts of the cell, such as the cell membrane or an organelle, or 
target them for secretion from the cell (Section 3.4.4). 


Ultimately, transcription and translation (accompanied in eukaryotes by 
modification of the mRNA and polypeptide gene products) result in the 
production of a functional protein, thereby completing the information flow 
(Figure 6.5). 


The control of the processes summarised in this section are examined in more 
detail in the rest of this chapter. The next two sections look at the control of 
transcription in prokaryotes and eukaryotes, respectively. In both cases, it is 
control at the level of transcription that exerts the major influence on the 
amount and type of gene products present in cells. Transcription can be 
divided into three main stages. RNA polymerase must first bind to the DNA 
template at the start of a gene; a process called initiation. In the second stage, 
elongation, the RNA polymerase moves along ‘reading’ the DNA sequence 
and synthesising a complementary mRNA. In the final, termination stage, 
elongation ceases and the transcript dissociates from the template. 


Chapter 6 The control of gene expression 


Summary of Sections 6.1 and 6.2 


« A typical cell expresses only a fraction of its gene repertoire at any one 
time. A multicellular organism with multiple cell types arises because 
different sets of genes are expressed in different types of cells. 

¢ The sequence of bases in the DNA molecule determines the amino acid 
sequence of the polypeptide encoded by a protein-coding gene. Each type 
of polypeptide in a cell is encoded by a specific gene. 

¢ The stages of gene expression are: transcription of DNA into RNA 
(followed by post-transcriptional modification in eukaryotes) and 
translation of RNA into polypeptide (followed by post-translational 
modification in eukaryotes). 

¢ Transcription is the process of RNA synthesis, in which information 
encoded in one of the strands of DNA is used as a template to produce a 
single-stranded RNA copy. Accuracy is assured by complementary base 
pairing in a similar way to DNA replication, except that in RNA, U (not 
T) pairs with A. RNA polymerase synthesises RNA by linking together 
ribonucleotides. 

¢ During transcription, RNA polymerase ‘reads’ the template DNA in a 3’ to 
5' direction, and synthesises the RNA transcript in a 5’ to 3’ direction, 
such that the transcript is complementary to the DNA template strand. 

¢ Translation is the process of polypeptide synthesis that occurs at the 
ribosome. A triplet codon of mRNA binds to the corresponding triplet 
anticodon of a tRNA molecule and the amino acid attached to the tRNA 
is attached to the previous amino acid in the polypeptide chain via a 
peptide bond. 

¢ Polypeptides are synthesised from the amino (-NH>) or N-terminus 
towards the carboxy (-COOH) or C-terminus. 

¢ The genetic code is degenerate; some amino acids are specified by more 
than one codon. 

¢ In bacteria, transcription and translation occur in the cytoplasm, close to 
the DNA; while in eukaryotes, transcription occurs in the nucleus and the 
mature mRNA is exported to the cytoplasm for translation. 


6.3 Control of prokaryotic gene transcription 


Transcription is rather simpler in prokaryotes, so first you will look at an 
example of the control of bacterial transcription, before moving on to the 
more complex process of transcription in eukaryotes. 


6.3.1 Initiation of bacterial transcription 


Bacteria have a single type of RNA polymerase that catalyses synthesis of all 
three main types of RNA (mRNA, rRNA and tRNA). The £. coli RNA 
polymerase is one of the largest proteins in the cell and has five subunits 
(Figure 6.6). There are four catalytic subunits (two large subunits, B and £’, 
and two copies of the a subunit, which all together form the core polymerase 
capable of synthesising an RNA chain) and a detachable regulatory subunit 


2ut 


Generating Diversity 


212 


called the sigma (a) factor. The complete, active enzyme assembly can be 
represented as aaBp'o. 


Figure 6.6 The structure of E. coli RNA polymerase attached to DNA (only the 
DNA template strand is shown). The active form of the enzyme has five subunits, 
and the complex is designated as aap’. 


Transcriptional initiation begins when RNA polymerase binds to the double- 
stranded DNA upstream of (before the start of) a gene at a specialised region 
called the promoter (Figure 6.7a). Although the nucleotide sequences of gene 
promoters differ, some regions of the promoter are very highly conserved; that 
is, they are always very similar (but not necessarily identical), indicating that 
this part of the sequence is essential to promoter function. These conserved 
sequences are known as consensus sequences. Prokaryotic gene promoters 
have a consensus sequence (3' TATAAT 5’) referred to as the -10 box (or the 
Pribnow box, after its discoverer) because it is about ten base pairs upstream 
of the site where transcription starts (Figure 6.7a). Most prokaryotic promoters 
also have a second consensus sequence (3’ TTGACA 5’) further upstream 
from the transcription start site, which is known as the —35 box. The RNA 
polymerase binds to gene promoters in DNA, and not elsewhere, because the 
c factor subunit of RNA polymerase specifically recognises the —10 and —35 
consensus sequences. 


Although all bacterial promoters share these consensus sequences, the rest of 
the promoter region differs between promoters. Bacterial cells in fact have 
several different types of o factors that recognise the consensus sequences in 
the context of slightly different promoter sequences, so different o factors 
have a role in specifying which genes are actively transcribed by the RNA 
polymerase. The overall promoter sequence determines the ‘strength’ of the 
promoter; that is, how often RNA polymerase successfully recognises and 
binds to the promoter and initiates transcription. In theory, RNA polymerase 
can bind to all promoters in the genome, so promoters must compete for a 
restricted pool of RNA polymerase molecules. A ‘strong’ promoter binds 
RNA polymerase molecules more often and hence will give a higher rate of 
transcription, producing more RNA transcript over a period of time, than a 
‘weak’ promoter. The frequency of initiation of transcription at a promoter can 
also be increased by other DNA binding proteins which associate with specific 
regulatory DNA sequences in the promoter region. You will encounter this 
type of regulation in the next section and later on in the chapter. 


Chapter 6 The control of gene expression 


The promoter sequence itself has polarity; that is, it has to be ‘read’ in a 
particular direction. As a result, when the RNA polymerase associates with the 
promoter, it is appropriately orientatated to transcribe the correct DNA 
template strand. The core promoter (sometimes called the basal promoter), 
which is present in all prokaryotic genes, includes the binding site for RNA 
polymerase and the transcription start site (Figure 6.7a). 


transcription 
RNA polymerase Start site 

DNA ~35 box —10 box 141 

template 3' (BUSS SES 5' 
strand Sa a ea ay 

promoter irection of 
ai RNA synthesis 
(a) 
Bt RNA transcript 
unwinding of 
DNA molecule 


3'5' 3’ 


rewinding of 
DNA molecule 


transcription bubble 


(b) 


Figure 6.7 (a) A prokaryote core promoter including the transcription start site (only the template DNA strand is 
shown). Two specific DNA sequences are recognised by the enzyme RNA polymerase, namely the —35 box and the 
—10 box. The base at which transcription starts is denoted by +1. The RNA polymerase (blue oval shape) binds to 
the promoter in double-stranded DNA to form a transcription initiation complex. (b) A transcription bubble. The 
RNA polymerase unwinds the DNA double helix and transcribes the DNA template strand to synthesise the RNA 
transcript. 


The complex formed once RNA polymerase is tightly bound to the core 
promoter is called a transcription initiation complex. Once bound, the 
DNA-protein complex converts from a closed complex (Figure 6.7a) into an 
open complex (Figure 6.7b) by the unwinding of a short stretch of the DNA 
double helix and separation of the two strands to expose 15-20 unpaired 
nucleotides on each strand. This is sometimes referred to as a transcription 
bubble. 


At this point the o factor is released from the RNA polymerase and RNA 
chain elongation begins. As the polymerase moves off, the promoter region is 
exposed again. This ‘promoter clearance’ signifies the end of initiation. The 
exposed promoter can immediately be bound by another molecule of RNA 
polymerase which initiates a new transcript. Transcription of the RNA actually 
starts at a specific base pair in the DNA template strand. This is called the 


213 


Generating Diversity 


214 


transcription start site and is defined as position +1 (Figure 6.7a). The 
nucleotide immediately preceding it is defined as position —1; there is no 
position 0. The RNA polymerase advances along the double helix, unwinding 
the DNA and rewinding it behind, such that the transcription bubble moves 
along the DNA, enabling one strand to act as the template for the synthesis of 
the RNA transcript. RNA synthesis continues, at about 50 nucleotides per 
second for bacterial RNA polymerase, until the enzyme encounters a 
terminator sequence in the DNA (described in Section 6.3.4), at which point 
the RNA transcript is released, and the RNA polymerase detaches from the 
DNA template. The freed polymerase reassociates with another o factor and 
can bind to another gene promoter to start initiation of a new transcript. 


6.3.2 Transcription is controlled by the binding of 
transcription factors to regulatory DNA sequences 


The initiation step of transcription is the main point at which the level of gene 
expression is controlled in both prokaryotes and eukaryotes. As well as the 
core promoter described above (which includes the site where RNA 
polymerase binds and the transcription start site) almost all prokaryotic and 
eukaryotic genes also have several upstream (and occasionally downstream) 
regulatory DNA elements. These are short consensus sequences in DNA that 
are specifically recognised and bound by DNA binding proteins called 
transcription factors. These factors either promote the recruitment of RNA 
polymerase to the promoter, thereby increasing transcription of the gene (in 
which case they are called transcriptional activators), or they repress 
recruitment of the RNA polymerase and decrease transcription of the gene 
(transcriptional repressors). 


= What other kinds of protein directly bind to DNA? 


© Some examples of DNA binding proteins are RNA polymerases, histones 
and DNA replication and repair enzymes (Chapter 5). 


Transcription factor proteins have two characteristic domains. The DNA 
binding domain interacts with the sugar—phosphate backbone and the bases of 
the DNA strands in such a way that the protein fits tightly into grooves of the 
DNA double helix. The second domain, the transactivation domain, interacts 
with other transcription factors, or with proteins bound at the core promoter, 
to regulate formation of the transcription initiation complex. There are many 
types of transcription factors, but because they all need to bind to DNA they 
tend to share a few common structural features called DNA binding motifs. 
The precise amino acid sequence of the DNA binding motif forms a binding 
site of a specific shape that determines which particular DNA sequence is 
recognised by the protein factor. Transcription factors can be grouped into a 
few families on the basis of their common DNA binding motifs, which have 
been highly conserved through evolution, and are present in a range of 
proteins from very different organisms. Three typical DNA binding motifs, 
namely the helix—turn—helix motif, the zinc finger, and the leucine zipper, are 
illustrated in Figure 6.8. 


Chapter 6 The control of gene expression 


= What is a common feature of the interaction between the three DNA 
binding motifs and the DNA helix shown Figure 6.8? 


© All three of these DNA binding motifs interact with the major groove of 
the DNA helix (Section 5.2.1) 


base pair sugar—phosphate 


C-terminus 


N-terminus 


(c) 


Figure 6.8 The structural motifs of three major types of DNA binding proteins. 
(a) and (b) Frontal and lateral (side) views, respectively, of the helix—turn-helix 
protein motif, which consists of three a-helices (in blue) that fit into the major 
groove of the DNA helix. The a-helix and other forms of protein secondary 
structure are described in more detail in Book 2, Chapter 1. (c) A cluster of three 
zinc finger motifs, so called because they contain small finger-like projections each 
consisting of an a-helix (blue) and a B-sheet (red) held together by a zinc atom 
(orange). Zinc fingers are rare in prokaryotes, but are very common in eukaryotes 
(d) A leucine zipper motif formed from two a-helices, belonging to separate 
protein molecules that together ‘grip’ the DNA molecule rather like a clothes peg. 


215 


Generating Diversity 


216 


The next section describes an example of gene control in E. coli that involves 
two transcription factors from the helix—turn—helix structural family. One is a 
transcriptional repressor called the /ac repressor and the other is a 
transcriptional activator called the catabolite activator protein (CAP). 


6.3.3. The bacterial operon 


Bacteria live in a dynamic environment in which the levels of nutrients, 
metabolites and other molecules are constantly changing, so the ability to 
rapidly alter the level of proteins required to process these molecules is 
essential for survival. The bacterial genome is organised in such a way that 
genes encoding proteins involved in the same biochemical pathway, for 
example the enzymes and transporter proteins responsible for the import and 
catabolism of nutrients, are often grouped together in units called operons. An 
operon is expressed as a single polycistronic mRNA transcript, controlled by a 
single promoter (Section 5.7,1). Altogether, there are about 600 operons in the 
E. coli genome. 


The Jac operon of E. coli is perhaps the best-studied bacterial operon. It was 
first described by Frangois Jacob and Jacques Monod, who together won the 
Nobel Prize for Physiology in 1965 for their contribution to the understanding 
of gene control. The Jac operon coordinates the expression of three genes 
required for the utilisation of lactose (the disaccharide that is the major sugar 
found in milk) as a source of energy. These three genes are known as: lacZ 
(which encodes B-galactosidase, an enzyme that breaks down lactose into 
glucose and galactose); /JacY (which encodes B-galactoside permease, a 
membrane protein that transports lactose into the cell); and /JacA (which 
encodes B-galactoside transacetylase, an enzyme whose function is currently 
unclear). 


E. coli cells growing in a laboratory culture medium that contains glucose as 
the only source of carbon and energy have very low levels of f-galactosidase 
and B-galactoside permease activity. If the cells are switched to a medium 
containing lactose instead of glucose, however, the level of expression of the 
B-galactosidase and B-galactoside permease proteins increases within about 
20 minutes. If the bacteria are then switched back from the lactose-containing 
medium to a glucose medium, f-galactosidase and B-galactoside permease 
expression is shut off, or repressed, within minutes of the changeover. 


How are these rapid changes in gene expression triggered by the presence or 
absence of lactose? The answer lies in the binding of specific transcription 
factors to regulatory DNA elements in the /ac operon. A region called the lac 
control region lies immediately upstream of the three /ac genes (lacZ, lacY 
and /acA). The lac control region consists of a promoter region P, and two 
regulatory DNA elements called the activator site (4S) and the operator (O) 
(Figure 6.9a). 


RNA polymerase is only able to bind to the promoter (P) and transcribe the 
three /ac genes into a single polycistronic mRNA when lactose is the main 
energy source available to the bacterium. The bacterium has two mechanisms 
that impose this regulatory control. Each of them employs a transcription 


Chapter 6 The control of gene expression 


factor protein — in one case a repressor and in the other case an activator — 
that binds to one of the regulatory DNA elements in the /ac control region. 


The Jac transcriptional repressor 


The first mechanism of regulating the /ac operon employs a transcriptional 
repressor called the /ac repressor. The /ac repressor is a protein encoded by 
the Jacl gene which is situated nearby the /ac operon. The /ac repressor 
protein is expressed constitutively, meaning that it is always ‘on’, and its 
expression remains at a constant level in the cell. When the cell is growing in 
a medium containing only glucose (no lactose), the Jac repressor binds tightly 
to the /ac operator DNA sequence (Figure 6.9b) and blocks the binding of 
RNA polymerase to the promoter. Consequently, there is no transcription from 
the /ac operon and the cell contains very little B-galactosidase and B- 
galactoside permease. 


When cells are switched to a medium containing lactose, the lactose molecules 
enter the cell and bind to the /ac repressor protein, causing it to undergo a 
conformational change in the part of the repressor protein molecule that is 
required for binding to DNA. In biochemistry this is known as allosteric 
regulation: the regulation of a protein’s activity by the binding of an effector 
molecule (in this case lactose) at a site other than the protein’s active site (in 
this case the active site is the repressor’s DNA binding site). You will learn 
more about this type of regulation in Book 2. The conformationally altered Jac 
repressor is no longer able to bind to the operator (Figure 6.9c), so the 
promoter becomes available for RNA polymerase binding. 


The control imposed by the /ac repressor is known as negative inducible 
regulation, because gene expression is induced only when the /ac repressor 
protein is inactivated by binding to lactose. 


= What would be the consequences for /acZ gene expression in a cell with 
a lacl gene mutation that resulted in /ac repressor protein unable to bind 
to lactose? 


© The cell would be unable to express /acZ, even in the presence of lactose. 
If the Jac repressor could not bind to lactose, the repressor would always 
remain bound to the operator, and RNA polymerase would be unable to 
transcribe any of the /ac operon genes. 


However, transcription of the /ac genes also depends on a second mechanism 
of regulation that employs a transcriptional activator protein. 


The CAP transcriptional activator 


E. coli ‘prefers’ to metabolise glucose, even if other sugars are present in the 
medium, so the expression of the enzymes necessary to metabolise other types 
of sugars are not required until all the available glucose is used up. 


The signal that detects the presence of glucose is cyclic adenosine 
monophosphate (cAMP), a small signalling molecule whose level in the cell is 
inversely proportional to that of glucose; in other words, when glucose is high 
in the cells, cAMP levels are low, and vice versa. In low glucose conditions 
cAMP molecules are available to bind to an E. coli transcriptional activator 


217 


Generating Diversity 


218 


Jac control 
lacZ lacY lacA 
3° [0] 5' the /ac operon 
(a) 
lac repressor bound 
RNA to operator, blocks 
polymerase RNA polymerase 
binding to P 
only glucose 
a + present 
no lacZ, Y,A 
(b) gene expression 


CAP activator lactose binds to lac 
cannot bind to AS repressor which can 


( j no longer bind to 
(cw) 6 mda 


glucose and lactose 
, Present, 
5" still no /acZ, Y, A 
gene expression 


(c) 


cAMP binds to CAP 


which in turn 
binds to od 


lac gene transcription 


. only lactose present, 


5" lac genes are 
strongly expressed 


3" 
(d) 


RNA polymerase 
binds to P 


Figure 6.9 The control of transcription at the /ac operon in the presence of 
lactose and glucose. (a) The /ac operon in the bacterium E. coli (see description in 
the text). Only the template DNA strand is shown and the —10 and —35 boxes are 
not shown. (b) In the absence of lactose (only glucose is present), the /ac repressor 
protein (yellow oval) binds to the operator sequence (O) of the /ac operon, and 
blocks the interaction of RNA polymerase (blue oval) with the promoter (P). 

(c) When some lactose is present in the growth medium, it binds to the lac 
repressor protein, inducing a conformational change such that the /ac repressor 
protein is no longer able to bind to the operator sequence. However, if glucose is 
also present, cAMP level remains low so catabolite activator protein (CAP, green 
circle) cannot bind to the DNA. (d) When only lactose is present (no glucose), 
cAMP levels rise allowing CAP to bind to the activator sequence (4S), which 
facilitates RNA polymerase binding to the promoter (P) and transcription of the 
lac genes is initiated. 


protein called catabolite activator protein (CAP) which thereby undergoes 
an allosteric conformational change allowing it to bind to the /ac operon at the 
activator site AS (Figure 6.9a). The binding of CAP to the AS is essential for 
efficient transcription of the Jac operon because CAP facilitates the binding of 


Chapter 6 The control of gene expression 


RNA polymerase to the promoter. Without CAP binding to AS there is very 
little transcription, even if the Jac repressor is not bound to the operator 
(Figure 6.9c). As long as some glucose is available to the cell, cAMP levels 
remain low and the CAP activator cannot bind to the activator site (4S). Thus 
RNA polymerase does not bind to the promoter and the Jac operon is not 
transcribed, even though there is also some lactose present in the growth 
medium. 


Only when the glucose is depleted does the cAMP level start to rise and alter 
the CAP activator so that it binds the activator site (4S), assists the RNA 
polymerase to bind to the promoter and initiates transcription. The Jac genes 
are expressed (Figure 6.9d) and the B-galactosidase and f-galactoside 
permease proteins very rapidly reach a high level in the cell. Lactose can then 
be imported into the cell (facilitated by B-galactoside permease) and 
hydrolysed by f-galactosidase to release galactose and glucose which are 
further metabolised to derive energy (Book 2, Chapter 3). 


The type of control imposed by the CAP activator is called positive inducible 
regulation, because /ac gene expression is induced only when the CAP 
activator binds to an inducer, in this case cAMP. 


6.3.4 Termination of transcription of bacterial genes 


Once initiated, RNA transcript elongation progresses until the RNA 
polymerase encounters a transcription terminator sequence in the RNA. The 
terminator will be located after the protein-coding region in an mRNA, and in 
E. coli, transcription terminates at the end of the last protein-coding region of 
an operon. 


Bacterial RNAs have two types of transcriptional terminator. One type is a 
binding site for an RNA binding protein called Rho factor which is a helicase 
(Section 5.3), that unwinds the RNA from the DNA template and disrupts the 
complementary base pairing between them. This terminates transcription and 
releases the RNA transcript. The other type of terminator is an inverted repeat 
sequence which, as the RNA is transcribed, folds back on itself to form a 
hairpin-shaped loop by complementary base pairing within the RNA strand 
(Figure 6.10). The hairpin structure destabilises the interaction between the 
RNA polymerase and the DNA-RNA hybrid, thus terminating transcription. 


Premature (early) termination of transcription is also commonly used by 
bacteria as a mechanism of downregulating gene expression; this is known as 
attenuation. Attenuators are terminator sequences that form hairpin-shaped 
structures close to the 5’ end of the RNA, often in response to an 
environmental signal. The hairpin prevents transcription of the full length 
RNA. For example, in the presence of high levels of the amino acid 
tryptophan, an attenuator hairpin forms close to the 5’ end of the polycistronic 
mRNA transcribed from the rp operon (which encodes the components for 
production of tryptophan). This prevents transcription of the full length sp 
mRNA, so no protein translation can occur, and no tryptophan is synthesised. 
The trp mRNA is only fully transcribed and translated when tryptophan levels 
are low so the attenuator hairpin is not present and the bacteria can synthesise 
the tryptophan it requires. 


219 


Generating Diversity 


220 


Figure 6.10 Termination of mRNA synthesis by the formation of a hairpin loop 
structure within a transcription terminator sequence in the transcribed RNA strand. 


Summary of Section 6.3 


« Gene transcription starts at a precise place in the gene called the 
transcription start site. 


* Bacterial RNA polymerase has five subunits: two large subunits (B and B’), 
two identical a subunits, and the regulatory o factor. 


« RNA polymerase binds to a special sequence of bases in the core 
promoter, slightly upstream of the transcription start site. Bacterial 
promoters have two consensus sequences where the polymerase binds; 
these are the —10 and the —35 boxes. 

¢ Theo factor subunit is responsible for recognising the —10 and —35 box 
consensus sequences. 

« In bacteria, genes for related functions are often grouped together in units 
called operons. Genes in an operon are regulated together, and are under 
the control of the same promoter. 

« Transcription factors (activators and repressors) regulate bacterial operons 
by binding to specific sites in the promoter (e.g. the operator and activator 
sites of the Jac operon). 

¢ In the /ac operon, the Jac repressor and the CAP-cAMP activator complex 
regulate transcription by binding to the operator site and the activator site, 
respectively. In the absence of lactose, the /ac repressor binds to the 
operator and inhibits transcription (negative inducible regulation). In the 
presence of lactose the Jac repressor can no longer bind to the DNA. 
Binding of CAP-cAMP to the activator site activates transcription 
(positive inducible regulation) once no glucose is present. 

« In bacteria, termination of transcription occurs at terminator sequences 
where a terminator protein, Rho helicase, separates the RNA transcript 


Chapter 6 The control of gene expression 


from the DNA, or where disruptive secondary structures (hairpin loops) 
form in the transcribed mRNA. Attenuation, premature termination 
resulting in truncated mRNA molecules that are not translated, is a 
mechanism by which transcription is downregulated in bacteria. 


6.4 Control of eukaryotic gene transcription 


Gene regulation in eukaryotes is generally far more complex than in bacteria. 
In a multicellular organism, as well as controlling the continual changes that 
each cell must make in response to its environment, gene regulation is also 
responsible for the programme of cell differentiation that generates multiple 
cell types. You will learn more about differentiation in Book 3. 


Gene regulation in an individual cell is also modulated by many different 
types of signals that originate from surrounding cells or from elsewhere in the 
organism. For example, all multicellular organisms produce growth factors 
and hormones (phytohormones in plants) — signalling molecules that bind to 
cells carrying appropriate receptor proteins. These molecules activate a signal 
transduction mechanism inside the cell that ultimately alters gene expression 
and leads to cell type-specific responses, such as a change in metabolism. You 
will learn more about signal transduction mechanisms in Book 2, Chapter 4. 


Eukaryotic gene expression is regulated at multiple stages and the 
complexities are well beyond the scope of this module; but in this section, the 
main mechanisms that control eukaryotic gene transcription will be discussed. 
The regulation of post-transcriptional mRNA modification and protein 
translation will be briefly described in the remainder of the chapter. 


Transcription in eukaryotes differs in three important ways from that in 
prokaryotes. 


(a) Eukaryotic polymerases cannot bind directly to DNA because they do not 
contain a subunit equivalent to the prokaryotic o factor which facilitates 
polymerase binding to the core promoter (Section 6.3.1). In eukaryotes, 
this function is instead carried out by a number of general transcription 
factors which must assemble with the RNA polymerase at the promoter 
before transcription can be initiated. 

Prokaryotic DNA is essentially ‘naked’ and freely accessible to the 
transcription proteins, but in higher eukaryotes the nuclear DNA is 
packaged with histone proteins to form densely packed chromatin 
(Section 3.4.3). Before transcription can take place, eukaryotic DNA must 
therefore be made accessible to the transcription machinery. 


(b 


i 


(c) The level of transcription of eukaryotic genes is regulated by multiple 
specific transcription factors that bind to regulatory DNA sequences 
spaced at intervals along the DNA, some of which may be thousands of 
base pairs away from the gene that they influence. These regulatory 
transcription factors work by facilitating the processes described in (a) and 
(b) above. 


221 


Generating Diversity 


6.4.1 The eukaryotic transcription initiation complex 


There are several types of eukaryotic RNA polymerases, which transcribe 
different types of genes. The transcription products of the three most 
important eukaryotic polymerases are summarised in Table 6.1. RNA 
polymerase II is the polymerase that transcribes most of the protein-coding 
genes, and the mechanisms described in this section relate to this enzyme. 


Table 6.1 RNA polymerases in eukaryotic cells. 


Type of RNA polymerase Transcription product 


RNA polymerase I tRNA 

RNA polymerase II mRNA and most small nuclear RNA (snRNA) 
and microRNA 

RNA polymerase III tRNA, small ribosomal RNA, and other small 


RNAs found in the nucleus and cytosol 


RNA polymerase II typically contains 10-12 subunits. Three of these are core 
subunits with sequence homology (i.e. their protein sequences show similarity) 
to the £. coli core polymerase subunits (aaBB', Section 6.3.1). This suggests 
that the structure of the core polymerase subunits is highly conserved across 
all species. The exact roles of some of the smaller subunits are poorly 
understood. Some of them are common to all types of eukaryotic polymerases 
while others are specific to a particular type of RNA polymerase. 


The core promoter of a protein-coding eukaryotic gene typically contains a 
highly conserved sequence called the TATA box usually between —25 and 
—35 bp upstream of the transcription start site (Figure 6.11), so called because 
the most frequently occurring bases in the TATA box are the six-base 
sequence TATAAA. 


Figure 6.11 The transcription control elements that control protein-coding gene expression in a typical eukaryotic 
gene. Only the template DNA strand is shown. RNA polymerase II binds to the core promoter, which includes the 
TATA box and the transcription start site, while regulatory transcription factors bind to the proximal promoter region 
about 250 bp upstream of the transcription start site, or to more distant regulatory elements, enhancers and silencers, 
which may be many kilobase pairs (kbp) distant from the transcription start site. 


222 


= How does the TATA box sequence compare with the consensus sequence 
of the —10 box of the prokaryotic promoter shown in Figure 6.7? 


Chapter 6 The control of gene expression 


“ There is homology (sequence similarity) between these two consensus 
sequences. 


This suggests that the sequence has been tightly conserved during evolution as 
the binding site for the RNA polymerase complex in all organisms. RNA 
polymerase II and a group of proteins called general transcription factors (to 
distinguish them from the specific transcription factors that regulate the level 
of transcription and which will be described later on) must bind to the core 
promoter to form the transcription initiation complex. The first stage of 
transcriptional initiation is the recognition of the TATA box by the general 
transcription factor TFIID (transcription factor IID, where II is the Roman 
numeral two). This is a protein complex consisting of a TATA binding protein 
(TBP) and a number of TBP-associated factors (TAFs) (Figure 6.12). 


TATA box 
DNA template strand _— aS ET 5° 
TEND e- TFIID recognises 
[ TEP i” the TATA box 
TATA box 
3 5' 
3" aS a 5' 
j] @TFIA 
as ® Trip 
F 
formation of the TFIIF/RNA 
initiation complex “ily RD orceee I 
vai @ Tri 
bas @® Trilx 
ogee J TFild 
E J 
3" 5 


Figure 6.12 Assembly of the RNA polymerase II transcription initiation complex 
on a TATA-containing promoter (see description in the text). Only the template 
DNA strand is shown. 


223 


Generating Diversity 


224 


A number of other general transcription factors (you don’t need to memorise 
all these) are then ‘recruited’ to the promoter and facilitate the binding of 
RNA polymerase II and separation of the strands of DNA, thereby allowing 
the transcription bubble to form. The RNA polymerase II bound to the 
promoter becomes phosphorylated, causing it to undergo a conformational 
change which releases it from the initiation complex to move along the DNA 
and start elongating the new mRNA chain. Most of the general transcription 
factors are released from the DNA at this point so that they are available to 
initiate transcription at another site. 


You may be surprised to learn that the transcription initiation complex, 
including the polymerase, consists of at least 40 polypeptides at this stage. 


To summarise, the three major roles of general transcription factors are: 
¢ positioning the RNA polymerase correctly at the promoter 
* separation of the two strands of DNA to allow transcription to begin 


« release of the RNA polymerase from the initiation complex to start 
transcription. 


6.4.2. Specific transcription factors regulate eukaryotic gene 
expression 


The level of expression of a eukaryotic gene is, as in prokaryotes, mainly 
determined by transcriptional activator and repressor proteins that regulate 
transcriptional initiation (the binding of the polymerase to the core promoter). 
Regulation of initiation in a eukaryotic gene is far more complex than in a 
bacterial operon. 


Some eukaryotic genes, often called housekeeping genes, are expressed 
constitutively (at a relatively constant level) in most cell types. These tend to 
be genes involved in some of the basic functions necessary for the 
maintenance of the cell: for example, genes that encode ribosome components 
and many of the metabolic enzymes. Most genes, however, are inducible; 
meaning that they are only expressed in particular cell types at particular 
times, or in response to signals such as growth factors. 


Transcriptional initiation is controlled by regulatory DNA elements that are 
recognised and bound by specific transcription factors in the appropriate 
circumstances. Transcription factors can be constitutively active or they may 
require activation by intracellular or extracellular signals, for example, the 
‘response factors’ shown in Table 6.2 that are activated in response to growth 
factors circulating in the blood serum or to cellular stresses, such as increased 
temperature. Some transcription factors are ubiquitous, that is, they are 
expressed in all types of cell, while others are cell-type specific. Most genes 
are only expressed when a particular combination of several different 
transcription factors binds to the appropriate regulatory elements. To illustrate 
the diversity of these transcription factors and the regulatory DNA elements to 
which they bind, a few examples are given in Table 6.2. 


Chapter 6 The control of gene expression 


Table 6.2 Examples of specific transcription factors and the regulatory DNA 
sequences they recognise. (N = any nucleotide) 


Factor Regulatory DNA sequence Comments 
recognised 

Constitutive transcription factors 

Spl 5' GGGCGG 3' ubiquitous 

Oct-1 5' ATGCAAAT 3° ubiquitous 

Response factors 

serum response 5’ CCATATTAGG 3’ activated by growth factors 

factor in serum 

heat shock factor 5’ CNNGAANNTCCNNG 3’ activated by heat shock 

Cell-specifie factors 

GATA-1 5’ GATA 3’ specific to erythroid cells 
(which give rise to red blood 
cells) 

MyoD1 5' CANNG 3' specific to myoblast cells 
(which give rise to skeletal 
muscle) 


As described in the previous section, eukaryotic protein-coding genes have a 
core promoter where the general transcription factors and the polymerase 
assemble. Most also have an upstream promoter region (sometimes called the 
proximal promoter) about 250 bp upstream of the TATA box (Figure 6.11) 
which contains binding sites for specific transcription factors (which can be 
inducers or repressors) that are necessary to control the level of gene 
transcription. 


Many eukaryotic genes also have a much larger and more variable ‘distal 
promoter region’ containing regulatory elements. These may be up to several 
thousand bases upstream of the transcription start site (Figure 6.11). In 
addition, there are regulatory DNA elements called enhancers and silencers 
that may be situated quite close to the promoter but can be much further from 
the gene, sometimes up to 50-100 kbp distant. Unlike promoter regulatory 
elements, enhancers and silencers can work in either orientation and may 
affect more than one gene. Enhancers are bound by transcriptional activator 
proteins which stimulate gene transcription. Silencers bind transcriptional 
repressor proteins and have the opposite effect of reducing gene transcription. 
Enhancers and silencers are very common in eukaryotes, but rare in bacteria, 
The first enhancer to be discovered was that encoded by simian virus 40 
(SV40), a virus (with a DNA genome) that infects eukaryotic cells. The 
enhancer was identified by joining up regions of DNA to core promoter 
sequences to see which combinations stimulate transcription of a ‘reporter’ 
gene (Box 6.1). The SV40 enhancer has been found to be capable of 
activating the transcription of many eukaryotic genes, and is often used by 
researchers as a component of plasmids employed to express recombinant 
proteins in cultured eukaryotic cells. Another example of an enhancer relates 
to the regulation of genes encoding antibodies (immunoglobulins). These 
genes contain an enhancer that can stimulate transcription, but is active only 
in B cells (the type of immune system cell that makes antibodies). 


225 


Generating Diversity 


Box 6.1 Reporter genes for analysis of gene expression 


In order to study the regulatory potential of DNA sequences in a gene 
promoter region, sections of the DNA sequence can be isolated, linked to 
an easily detectable reporter gene and reintroduced into cultured cells, or 
even animals or plants. The level of expression of the reporter gene in 
the cell is an indication of whether the DNA sequence contains elements 
that can act as constitutive or regulatory promoter elements. 


Certain genes are routinely used as reporters because the characteristics 
they confer on the cells expressing them are easily identified and 
measured. Common reporter genes are the green fluorescent protein 
(GFP) gene isolated from jellyfish, which causes the cells that express it 
to glow green when exposed to ultraviolet light (Figure 6.13), and the 
gene encoding luciferase, an enzyme that oxidises a pigment called 
luciferin to release light, which can be measured using a luminometer. 
Another common reporter is B-galactosidase, an enzyme that cleaves the 
sugar lactose (Section 6.3.3). A colourless modified form of lactose 
called Xgal is added to the cells, and if B-galactosidase is expressed, it 
converts the Xgal into a blue product (Box 5.2) that can be detected by 
absorbance of light using a spectrophotometer. 


GFP coding 
region 


insert regulatory region into a 
DNA plasmid uy m of the 
reporter gene (GFP) 


GFP coding 
region 


measure intensity 


of green 
fluorescent light 


Figure 6.13 Reporter gene assay for gene expression. A plasmid (Box 5.2) carrying regulatory DNA sequences 
fused to the coding region of the green fluorescent protein (GFP) gene is introduced into cells which transcribe and 
translate the GFP gene only if the regulatory DNA sequence is active in the host cells, i.e. if the appropriate 
regulatory transcription factors are present. The amount of GFP protein produced therefore reflects the activity of the 
regulatory DNA sequence and can be measured by exposing the cells to UV light and using a fluorescent microscope 
or a fluorimeter to measure the intensity of the green fluorescent light emitted by the cells. Shown here are multiple 
cell samples fluorescing in a 96-well plate. 


226 


Chapter 6 The control of gene expression 


6.4.3 Coordinating the initiation of transcription 


How do these various regulatory DNA sequences and transcription factors 
work together to regulate transcription? In many cases, a large complex of 
more than 20 proteins called the mediator complex is required to act as an 
intermediary (Figure 6.14) between the general transcription machinery (the 
RNA polymerase and the general transcription factors bound at the core 
promoter) and specific transcription factors bound to the proximal promoter or 
to more distant enhancers and silencers. It is assumed that the DNA bends and 
loops such that these transcription factors are brought into contact with the 
mediator complex. 


ones 


af BE a 5" 
TATAbox _startof 
binding of transcription 
general transcription 


rea ui factors, mediator complex 
and RNA polymerase 


sequences 


spacer DNA & 


U_____——=— _ start of 
RNA polymerase transcription 
and general 
transcription 

factors 


Figure 6.14 Combinatorial control of eukaryotic gene expression by multiple transcription factors binding to 
regulatory elements. DNA bending at the promoter allows specific transcription factors bound to both local and 
distant regulatory DNA elements to interact with the RNA polymerase II complex, in many cases through binding 
to the mediator complex. Only the DNA template strand is shown, broken lines indicates long stretches of 
intervening DNA. 


227 


Generating Diversity 


228 


The whole complex of transcriptional activators, repressors and mediator 
complex is thus brought into contact with the RNA polymerase, and the 
different components of the complex interact to either stabilise or destabilise 
the assembly of the polymerase and general transcription factors at the core 
promoter. In addition, some transcriptional regulators also attract proteins that 
modulate chromatin structure to change the accessibility of the promoter to the 
general transcription factors and the polymerase, as will be discussed in the 
next section. 


Eukaryotic transcription factors therefore largely work as an organised group 
to determine the expression of a gene in the appropriate cell, at the 
appropriate time and at the required level. This is often referred to as 
combinatorial control, and it has the advantage that many different genes can 
be controlled in complex ways with different combinations of a relatively 
small number of regulatory proteins. In many cases, however, it is useful to 
have a final key transcription factor that is able to switch a large number of 
genes on or off together in a coordinated fashion. An example of this is the 
glucocorticoid receptor, a protein that is expressed in almost all mammalian 
cells, but only binds to a specific regulatory DNA sequence when it forms a 
complex with a glucocorticoid hormone, for example cortisol which is 
released from the mammalian adrenal gland in response to stress. In the 
presence of cortisol, the glucocorticoid receptor coordinates the switching on 
of many different genes that increase the catabolism of sugars and fats to 
increase blood glucose, and also suppress the immune system; hence synthetic 
forms of cortisol are used to treat certain inflammatory diseases. You will 
learn more about the important role of combinatorial control and key 
transcriptional regulators in cell differentiation and development in Book 3, 
Chapter 1. 


6.4.4 Accessibility of the eukaryotic DNA to the 
transcription machinery 


You will recall from Chapter 3 (Figure 3.12) that in eukaryotes, genomic 
DNA is packaged into chromosomes by wrapping the DNA around histone 
proteins to form structures called nucleosomes that are spaced along the DNA. 


= What are the names given to transcriptionally inactive and active regions 
of chromatin structure? 


© Heterochromatin is more densely packaged and condensed, generally 
contains few genes and is transcriptionally inactive, while euchromatin is 
more loosely packed and is where active gene transcription is taking 
place. 


Wrapping of the DNA around histones imposes an obstacle to the binding of 
other proteins, such as the RNA polymerase and general transcription factors. 
In fact, some of the specific transcription factors activate promoters by 
changing the local structure of chromatin at the promoter to improve access to 
the DNA. This type of transcriptional activator binds to the histone proteins in 
nucleosomes and attracts enzymes called histone acetylases which catalyse the 
addition of acetyl groups (CH3;CO-) to a particular type of amino acid (lysine) 
in the histone protein (there are also transcriptional repressors that attract 


Chapter 6 The control of gene expression 


histone deacetylases, which remove these acetyl groups). The acetylation of 
histones is a signal that attracts chromatin-remodelling proteins whose 
function is to remove the nucleosomes or slide them apart to expose the DNA 
(Figure 6.15), allowing transcription factors and the RNA polymerase to bind 
to the DNA. 


nucleosome 


DNA 


histone protein 


histone proteins 
modified by 
histone acutylase 


histone proteins 
reorganised or 
removed 


transcription initiation 
complex can bind 


Figure 6.15 Some transcriptional activators ‘recruit’ histone acetylases which modify histone proteins. The histone 
modification attracts chromatin-remodelling proteins to the promoter region which reorganise or remove histone 
proteins, making the DNA more accessible to the RNA polymerase. 


229 


Generating Diversity 


The British scientist Ed Southern 
invented a technique, called 
Southern blotting, in which DNA 
is separated by gel 
electrophoresis then transferred 
to a membrane. The 


corresponding technique used for 


RNA analysis was named 
‘northern’ blotting in a play 
on words. 


230 


6.4.5 The study of coordinate 


Detecting the presence of the RNA transcript of a gene in a cell or tissue is 
the most direct way of determining whether the gene is active. To understand 
the role of individual genes in a particular type of cell or tissue, researchers 
have developed several methods for studying the level of mRNA of a single 
gene, or of multiple genes simultaneously (Box 6.2). 


Box 6.2 Methods for studying gene expression 


Northern blotting 

The technique of northern blotting is used to separate and identify 
individual cellular mRNAs (Figure 6.16). First, all of the RNA is 
extracted from a sample of cells or tissue. Then the RNA is ‘loaded’ into 
a gel made of agarose, and an electric current is applied across the gel 
such that the negatively charged RNA molecules (negative because of the 
phosphate groups in their sugar backbone) are attracted towards the 
‘positive’ electric pole. As the RNA molecules move through the gel 
towards the positive pole, the gel acts like a sieve, separating the RNA 
molecules on the basis of size (a technique called gel electrophoresis); 
small molecules move further through the gel than larger molecules. 


The ‘smear’ of RNA molecules, separated according to size, is then 
transferred from the gel onto the surface of a nylon filter membrane. This 
membrane is then incubated with a solution containing a small DNA 
‘probe’ sequence that is complementary to part of the mRNA of interest; 
a technique called nucleic acid hybridisation. The probe will interact 
only with the specific target mRNA sequence. The probe is first 
‘labelled’ by incorporating either radioactive nucleotides, or nucleotides 
that are coupled to enzymes that can produce a coloured reaction 
product. If the mRNA of interest is present on the surface of the filter 
membrane, the labelled probe anneals with it by complementary base 
pairing, forming a DNA-RNA duplex. The location of the labelled probe 
can then be visualised using autoradiography (exposure to X-ray 
detecting film) in the case of a radioactive probe, or by the addition of a 
suitable chromogenic substrate for an enzyme-labelled probe. The main 
steps in northern blotting are shown in Figure 6.16. 


Figure 6.16 Northern blotting. (1) RNA is isolated. (2) The RNA molecules 
are separated on a size basis by agarose gel electrophoresis. RNA ‘markers’ 
are included so that the sizes of RNA species in the samples can be 
estimated. (3) The gel is compressed against a nylon filter membrane so that 
as’ the liquid from the gel soaks into the membrane by capillary action, the 
RNA moves out and is ‘blotted’ onto the membrane. (4) The blot is 
hybridised with a labelled probe to identify specific RNA molecules. Finally 
the filter membrane is processed to visualise the labelled probe attached to 
the target RNA molecule. 


Chapter 6 The control of gene expression 


Generating Diversity 


232 


The advantages of the technique are that it identifies the size of the 
mRNA (by comparison with standard ‘marker’ RNAs of known size) and 
hence allows discrimination of alternatively spliced transcripts 

(Section 6.5.2). However, it can’t be used to accurately quantify the 
amount of RNA and doesn’t detect very low levels of mRNA. 


RT-PCR and qPCR 


RT-PCR (reverse transcription polymerase chain reaction) has 
revolutionised the study of gene expression, making it possible to detect 
the RNA transcript of almost any gene, regardless of the scarcity of the 
starting material or relative abundance of the mRNA in the cell. It is a 
variant of the PCR technique described in Box 5.1. First, the total 
cellular mRNA is purified and used as a template for reverse 
transcriptase enzyme to generate DNA copies of the mRNAs called 
complementary DNA (cDNA). 


= What is a source of reverse transcriptase? 


This enzyme is produced by retroviruses which need it to produce a 
DNA copy of their RNA genome (Section 5.9). 


The cDNA copy of the mRNA of interest can then be amplified using 
gene-specific primers in a PCR reaction (Figure 6.17). It is possible to 
accurately quantitate the amount of a particular mRNA by a quantitative 
version of PCR (qPCR). This is now routinely carried out using PCR 
primers labelled with a fluorescent ‘tag’. Comparing the amount of 
fluorescent PCR product to a parallel ‘standard’ PCR reaction with a 
known starting amount of mRNA can produce an absolute measurement 
of the number of copies of the original target mRNA, typically in copies 
per cell. Some quantitative PCR techniques are referred to as ‘real-time 
PCR’ (often also rather confusingly abbreviated to RT-PCR) because the 
accumulation of the fluorescent PCR product can be monitored in real 
time. 

The advantage of transcript detection by RT-PCR and qPCR is that it is 
quantitative and very sensitive (detection of a single mRNA molecule is 
possible), but it can be technically challenging, requiring careful design 
of experimental controls, because its extreme sensitivity means that even 
minute amounts of contamination by genomic DNA can produce 
misleading results. 


3.0 
2.5) 


2.0)standard RNA 


15 test RNA 


1.0 negative control 
0.5 


0 


10 20 30 40 50 
number of amplification cycles 


calculate quantity of PCR 
product by comparison 
with a standard amount of 
starting RNA template 


Chapter 6 The control of gene expression 


233 


Generating Diversity 


234 


DNA microarrays 


The number of different genes that are activated in a single biological 
process may be huge. For example, during the process of transition from 
aerobic to anaerobic respiration (Book 2, Chapter 3) in the yeast 

S. cerevisiae, changes in the expression of 1740 genes have been 
recorded. The expression of multiple genes can be analysed 
simultaneously using a technique known as a DNA microarray or DNA 
‘chip’. 

A DNA chip is a thin layer of silicon, 2 cm’ or less in area, carrying a 
large number of individual DNA probes (usually more than 1000), each 
one with a different sequence representing a particular gene, and each at 
a defined position on the chip. The probes are synthetic single-stranded 
oligonucleotides that are spotted onto the silicon surface, using high- 
speed robots, to form the ‘microarray’. The chip is incubated with cDNA 
prepared from total cellular mRNA using reverse transcriptase (see RT— 
PCR above) and which has been labelled, usually with a fluorescent 
molecule. Hybridisation takes place between each probe and any of its 
target cDNA that is present in the sample. To determine how much target 
mRNA has hybridised to each probe spot, the surface of the chip is 
scanned, and the intensity of the fluorescent signal emitted by bound 
target mRNA is detected and recorded (Figure 6.18). 


The DNA microarray analysis technique can be used to compare the 
pattern of mRNA expression in two different conditions: for example, the 
same cells treated in two different ways, cells from two different tissues, 
or diseased versus normal tissues (as illustrated in Figure 6.18). Studies 
on cancerous tissue using this technique discovered 299 genes with 
expression patterns that differed significantly when normal colon 
epithelial cells were compared with colon cancer cells. About half of 
these genes also showed abnormal expression levels in pancreatic cancer 
cells. The implication of this study is that some genes are abnormally 
expressed in more than one type of cancer, whereas others are 
abnormally expressed only in specific cancers. Because microarrays can 
be used to examine the expression of hundreds or thousands of genes at 
once, this technique has revolutionised the way scientists examine gene 
expression, and identify genes involved in diseases. 


Chapter 6 The control of gene expression 


cancer cells normal cells 
ZS <= | 
J isolate MRNA 
PAY DPA 
mRNA mRNA 
REY RARE 
J synthesis and labelling of J 
cDNA using reverse 
transcription 
red green 
AWN * AAW? 
fluorescent ZNYNZNG * YOY + fluorescent 
cDNA YY * cDNA 
combine targets 
hybridise to 
microarray 


Figure 6.18 DNA microarrays allow the expression level of multiple genes 
to be determined in a single assay. cDNA is prepared by reverse 
transcription of template mRNA purified from cells or tissues, labelled with 
a fluorescent molecule and hybridised to the microarray chip. Labelled 
mRNAs bound to the individual DNA probes on the chip are detected by 
fluorescence microscopy, and the intensity of the fluorescence signal 
measured. This example shows two-colour detection to compare gene 
expression in normal and cancer cells. The cDNAs prepared from the two 
cell types have been labelled with different fluorescent probes (red and 
green) so that differences in gene expression between the two cell types are 
detected by the relative intensity of red and green fluorescence associated 
with each DNA probe spot on the chip. 


235 


Generating Diversity 


236 


6.4.6 Gene control at the level of transcriptional elongation 
and termination 


Once initiation is complete, the process of elongating the new RNA strand is 
similar in both prokaryotes and eukaryotes (see Section 6.3.1). Elongation is 
not a smooth process. The polymerase transcribes rapidly through some 
sequences but pauses at others. 


= What features of eukaryotic DNA might impede the RNA polymerase? 


© Regions where the DNA is wrapped around nucleosomes. 


During elongation in eukaryotic cells, the RNA polymerase is associated with 
elongation factor proteins that reorganise the nucleosomes. This allows the 
polymerase to progress along the DNA and ensures that it doesn’t dissociate 
from the DNA before it reaches the end of the transcription unit. 


Considerably less is known about the control of transcription termination in 
eukaryotes than in bacteria. The three RNA polymerases use different 
mechanisms. Termination of the transcription of mRNAs by RNA polymerase 
II is coupled to the process that polyadenylates the 3’ end of the transcript 
(Section 6.5.1 below). 


There are also some examples in eukaryotes where protein levels are regulated 
by a mechanism that results in premature termination of transcription. An 
example of this occurs during transcription of the HIV-1 (AIDS virus) genome 
in the eukaryote cells that it infects. During early infection, when the disease 
is not fully developed, the viral MRNA molecules terminate prematurely, and 
are not translated into functional proteins, so few new virus particles are 
made. However, in the later stages of infection, high levels of full-length 
transcripts are produced because premature termination is overridden by an 
HIV-1 protein called Tat, a mechanism referred to as anti-termination. 


Summary of Section 6.4 


¢ There are three forms of the enzyme RNA polymerase in eukaryotes, 
namely RNA polymerase I, II and III. RNA polymerase II transcribes 
protein-coding genes. 

« All three eukaryotic RNA polymerases contain three types of core subunit 
(with sequence homology to prokaryotic RNA polymerase subunits) and a 
number of smaller subunits. 

« The core promoters of most eukaryotic protein-coding genes have a highly 
conserved sequence called the TATA box (homologous to the prokaryotic 
—10 box) positioned between 25 and 35 bp upstream of the transcription 
start site. 

¢ Formation of the transcription initiation complex requires sequential 
assembly of general transcription factors at the core promoter, starting with 
TFIID and culminating in the binding of RNA polymerase II to the core 
promoter. 

* Changes in the structure of chromatin to remove the obstacle presented by 
nucleosomes is necessary for proteins involved in transcription to gain 


Chapter 6 The control of gene expression 


access to eukaryotic DNA, and chromatin modification contributes to the 
regulation of gene expression. 

¢ Specific transcription factors bind to constitutive and regulatory DNA 
elements in the upstream promoter, and to enhancer and silencer sequences 
which may be even greater distances from the promoter. These regulate 
transcriptional initiation either by stabilising or destabilising binding of the 
RNA polymerase and general transcription factors to the core promoter, or 
by attracting proteins that modify chromatin locally to make it either more 
or less accessible. 

¢ The level of gene expression in eukaryotes (as in prokaryotes) is largely 
determined by the regulation of transcriptional initiation, but in some cases 
can also be controlled by elongation factors or termination mechanisms. 


6.5 Post-transcriptional control of gene 
expression in eukaryotes 


Prokaryotic mRNAs are ready for translation without further processing. 
Indeed, prokaryotic mRNAs start to be translated even before transcription is 
completed. 


= Bearing in mind the differences between prokaryotic and eukaryotic 
intracellular structure, why is coupling of transcription and translation 
possible in prokaryotic cells but not in eukaryotic cells? 


© Prokaryotes have no nuclear envelope separating their DNA from the 
cytoplasm. In contrast, transcription takes place in the nucleus of 
eukaryotic cells, so the mature mRNAs must be transported out of the 
nucleus to the cytoplasm before translation can take place at ribosomes in 
the cytoplasm. 


Transcription is really only the first step in the production of a mature 
eukaryotic mRNA. The primary transcript must undergo a number of 
modifications including removal of intron sequences, capping of the 5’ end 
and polyadenylation of the 3' end (Figure 6.5) before it is ready for export 
from the nucleus to the cytoplasmic ribosomes. These processing events and 
their regulation are described briefly in this section. 


6.5.1 Modification of the ends of mRNA 


Capping is a process that occurs immediately after transcription has started. 
Eukaryotic mRNA is capped at its 5’ end by the addition of an unusual 
modified guanosine triphosphate (GTP) nucleotide. The reaction is catalysed 
by the enzyme guanylyl transferase, which is a component of the RNA 
polymerase II complex. The 5’ cap has several important functions in 
eukaryotes: 


¢ It is required for mRNA export through the nuclear pores. 


¢ It is essential for ribosome binding and efficient translation of the mRNA 
(Section 6.6). 


237 


Generating Diversity 


238 


It prevents enzymes in the cytoplasm from degrading the mRNA from its 
5' end before it can be translated. 


Capping can therefore also regulate protein expression by regulating mRNA 
half-life and translation. An interesting example of how the cap can be used in 
gene regulation comes from studies on influenza virus, which contains RNA 
as its genetic material. This virus does not cap its own genome segments, but 
steals pre-formed caps from host mRNA, a process called ‘cap-snatching’. The 
host mRNA, which has lost its cap, cannot bind to ribosomes and initiate 
translation (Section 6.6.1). Thus, host protein can no longer be made, but viral 
protein can. 


Almost all eukaryotic mRNAs (and some non-protein-coding RNAs) also 
undergo polyadenylation of their 3’ end, immediately after termination of 
transcription and release of the primary transcript. Polyadenylation involves 
addition of multiple adenosine nucleotides to generate a poly(A) tail. The 
length of the tail may vary; but around 200 adenosines is usual. The poly(A) 
tail acts as the binding site for poly(A) binding protein which inhibits 
degradation of mRNA from the 3’ end and can also promote its export from 
the nucleus, and its translation. 


6.5.2 mRNA splicing 


You will recall from Section 5.8.1 that in eukaryotic genes, the sequences 
encoding the polypeptide (exons) are interrupted by non-coding introns. The 
introns must therefore be removed from the primary transcript to produce a 
mature mRNA suitable as a template for translation. Introns are removed by 
mRNA splicing, a process in which the introns are cut out and the exons are 
joined together to form the mature mRNA. Splicing is carried out by a 
complex known as the spliceosome, containing several proteins and small 
nuclear RNAs (snRNAs, Section 6.2.1). The snRNAs bind to complementary 
sequences at the ends of introns, and the spliceosome then cuts out each 
intron and joins the ends of the flanking exons together. 


In fact, mRNA splicing also has an important role in gene regulation, because 
the outcome of splicing may produce a number of different transcripts 
depending on which exons are included in the mature transcript (Figure 6.19). 
This differential or alternative splicing may therefore result in the expression 
of several different mRNAs from the same gene in different cells and tissues. 
The mRNAs produced in this way (which share some exons and not others) 
are called splice variants or splice isoforms. The resulting variant mMRNAs 
may be translated into different protein isoforms; thus, a single gene may code 
for multiple proteins which may have similar or different properties. 


Gene control by tissue-specific alternative RNA splicing has been observed in 
many eukaryotic genes; the fibronectin gene is a good example. Fibronectin is 
an extracellular matrix protein secreted by fibroblasts and liver cells (as well 
as other types of cells). The form of fibronectin secreted by fibroblasts can 
interact with cell surface receptors called integrins (Section 3.5.1) and helps 
cells to adhere to the extracellular matrix. The form of fibronectin that is 
secreted by liver cells, in contrast, does not promote adhesion, but is secreted 
into the circulation and is a major component of blood plasma. Both types of 


Chapter 6 The control of gene expression 


Figure 6.19 Alternative splicing of eukaryotic mRNA. The gene in this example 
contains four protein-coding exons (dark purple bands) separated by non-coding 
introns (light purple bands). The same sequence is transcribed into the primary 
mRNA, which then undergoes splicing to yield mature mRNAs. Two splice-variant 
mRNAs are shown that have resulted from splicing the same primary transcript in 
different ways to retain different combinations of exons. 


protein are encoded by the same fibronectin gene, but the primary transcript is 
subject to alternative splicing. In fibroblasts, splicing yields a mature mRNA 
that includes two exons encoding the protein domains that interact with cell 
surface receptors in several cell types. Splicing of the fibronectin primary 
transcript in liver cells, however, ‘skips’ these two exons, yielding a mature 
mRNA that does not encode the corresponding domains. 


6.5.3. The stability of mRNA 


At one time, biologists generally assumed that the rate at which a particular 
mRNA was synthesised was the major determinant of the amount of that 
mRNA in the cell, and therefore the amount of the corresponding protein in 
the cell. Further research has, however, demonstrated that the rate of MRNA 
turnover can also significantly affect the rate of protein synthesis. In other 
words, the production of proteins reflects not only how fast their mRNA 
templates are made, but also how fast the mRNA is broken down, or 
degraded. In eukaryotic cells there is a balance between the processes of 
translation and mRNA degradation. 


= How are mRNAs that are being actively translated protected from 
degradation? 


“By binding to ribosomes and poly(A) binding protein and by 5’ capping 
(Section 6.5.1). 


Messenger RNAs are constantly attacked by enzymes called ribonucleases 
that remove nucleotides from the end of the poly(A) tail and gradually shorten 
it over time, so that the mRNA eventually becomes vulnerable to complete 
degradation. 


239 


Generating Diversity 


240 


Regulated changes in the turnover rates of various mRNAs can also help the 
cell to respond quickly to changes in its environment. An extreme example of 
this is the herpes simplex virus which causes cold sores and some genital 
infections. Like other viruses, it uses the molecular machinery of the infected 
host cell to replicate itself. It appropriates the host cell’s protein synthesis 
machinery and eliminates competing demands by producing a viral 
ribonuclease that preferentially degrades the host cell’s mRNA molecules, 
thereby freeing up more ribosomes to translate viral mRNAs. 


In some cases, a protein may regulate the level of its own mRNA. An 
example of this is tubulin. As you learnt in Section 3.4.2, tubulin monomers 
assemble into microtubules, which are part of the cell’s cytoskeleton. When a 
large pool of free tubulin monomers is available, there is no need for the cell 
to make more. Under these conditions, tubulin molecules bind to tubulin 
mRNA, making it susceptible to breakdown by ribonucleases, and reducing 
the synthesis of new tubulin protein. 


6.5.4 Control of gene expression by short microRNAs 


It has become increasingly apparent in recent years that certain types of non- 
coding RNA (Section 6.2.1) are important regulators of gene expression. The 
most extensively studied of these are microRNAs (miRNAs). These are very 
small ncRNAs, about 22 nucleotides long, found in the cells of animals and 
plants, which can negatively regulate, or ‘silence’ the expression of certain 
genes. 


MicroRNAs are themselves encoded by genes that often occur in clusters. 
These are transcribed as long, polycistronic primary transcripts that are 
cleaved; first in the nucleus (by a ribonuclease called Drosha), and then in the 
cytosol (by another ribonuclease called Dicer) to finally generate ~22 base 
pair double-stranded miRNA molecules (Figure 6.20). One strand of each 
miRNA is then incorporated into a protein complex called RNA-induced 
silencing complex (RISC) and guided to bind to 3’ regions of complementary 
target mRNAs, resulting in either degradation of the target mRNA or 
inhibition of its translation. miRNAs therefore provide yet another mechanism 
of regulating gene expression. Regulation by miRNAs is not as simple as one 
miRNA per gene. Each miRNA can potentially target multiple mRNAs. There 
are more than 700 miRNAs encoded in the human genome that could, in 
theory, collectively regulate at least a third of human genes. 


Although the function of most miRNAs is yet to be determined, many seem to 
be important during the development of different tissues in multicellular 
organisms. Different miRNAs may coordinate the regulation of a gene at 
different times during development. One particular miRNA, known as miR-1, 
is expressed in developing heart muscle and regulates the level of several 
muscle-specific transcription factors that control muscle protein expression. In 
a strain of mutant mice that do not express miR-1, the cardiac muscle 
precursor cells, i.e. the cells destined to differentiate into specialised cardiac 
muscle cells, proliferate a great deal more than normal, but are unable to 
differentiate into mature functional muscle. This results in severe heart 
defects. miR-1 therefore seems to ‘fine-tune’ the balance between the 


Chapter 6 The control of gene expression 


NUCLEUS 


processing of polycistronic 
_ || primary miRNA transcript by 
Drosha ribonuclease 


Figure 6.20 The expression and processing of miRNAs. A polycistronic primary 
transcript including several miRNA sequences is processed in the nucleus by the 
Drosha ribonuclease to separate the hairpin-shaped pre-miRNAs. These are 
processed further in the cytoplasm by the Dicer ribonuclease to form ~22 base pair 
double-stranded miRNAs which associate with the RISC complex as single 
strands. The RISC complex then recognises mRNAs complementary to the miRNA 
strand and, depending on the degree of complementarity between the miRNA and 
the target mRNA, either blocks their translation, or degrades the mRNA. 


proliferation of cardiac muscle precursors and their differentiation into mature 
muscle cells. You will learn more about cell differentiation in Book 3, 
Chapter 1. 


241 


Generating Diversity 


There is evidence that miRNAs act as key regulators of a wide range of 
processes, and not surprisingly, they have been implicated in a number of 
diseases including heart disease, neurological diseases and cancers. The ability 
of complementary RNAs to regulate gene expression has also been exploited 
for research into gene function (Box 6.3). 


Box 6.3 Technologies using antisense RNA, miRNA and 
siRNA 


The ability of RNA molecules to specifically switch off the expression of 
individual genes has exciting potential for cell biology research. The first 
technique of this type to be developed was the use of antisense RNA, in 
which a single-stranded RNA complementary to a specific mRNA is 
introduced into cells. The double-stranded hybrid RNA that forms cannot 
be recognised by the ribosome, so mRNA translation and synthesis of a 
polypeptide are blocked. However, this technique has proved difficult to 
develop for commercial and clinical applications. 


While researchers were conducting experiments to test the effects of 
antisense RNAs, it became apparent that the complementary sense RNA 
(used in control experiments) unexpectedly turned out to be just as 
effective as the antisense RNA. The reason is probably that these 
experiments were contaminated by double-stranded RNAs (dsRNAs) 
formed from sense and antisense duplexes. The ability of dsRNA to 
specifically suppress the expression of a gene containing the same 
sequence is called RNA interference (RNAi). When the dsRNA enters a 
cell, it is cleaved by Dicer into fragments about 22 bp long called small 
interfering RNAs (siRNAs). The two strands of the siRNA separate, and 
the antisense strand is incorporated into the RISC complex. The RISC 
complex then degrades any mRNAs containing sequence complementary 
to the siRNA. 


= In what ways is the processing of siRNAs and miRNAs similar? 


© Both are processed by Dicer in the cytoplasm to produce ~22 bp 
duplex one strand of which is incorporated into the RISC complex. 


Small synthetic double-stranded siRNA molecules can be easily 
synthesised in the laboratory and the RNAi technique is routinely used to 
turn off expression of individual genes in order to explore their function 
in cells or whole organisms (Box 4.1). siRNA ‘libraries’are available 
which include siRNAs capable of targeting almost every individual gene 
in the genomes of C. elegans, Drosophila and the human genome. 


miRNAs and siRNAs also have potential for gene therapy. Research has 
identified siRNAs that could be used to treat the eye condition acute 

macular degeneration (AMD). siRNAs that repress the expression of the 
growth factor VEGF prevent the overgrowth of blood vessels in the 

retina of the eye that can result in vision loss in people with AMD. The 
expression of some other natural miRNAs is defective in certain cancers, 
and artificial expression of a specific miRNA in mice with an aggressive 


242 


Chapter 6 The control of gene expression 


Summary of Section 6.5 


¢ Most eukaryotic mRNAs are first produced as primary transcripts which 
must be processed to produce a mature RNA. This is achieved by capping 
(addition of a GTP residue at the 5’ end), polyadenylation (addition of a 
variable number of adenosine nucleotides to generate a poly(A) tail at the 
3’ end) and splicing to remove the non-coding introns from the mRNA. 

¢ Availability of mRNA is an important factor in gene regulation; the 
production of proteins reflects not only how fast their mRNA templates are 
made, but also how fast they are degraded. 

¢ Degradation of mRNA is a specific process that removes coding mRNAs 
whose protein products are no longer required. mRNA stability (and 
therefore half-life) is affected by modifications like capping and 
polyadenylation. 

¢ MicroRNAs (miRNAs) have an important role in post-transcriptional 
regulation of gene expression, by targeting specific mRNAs for 
degradation, or by inhibiting their translation. This phenomenon can be 
exploited experimentally by introducing chemically synthesised small 
interfering RNAs (siRNAs) into cells in order to target specific mRNAs. 
This allows researchers to turn off the expression of specific genes to 
study the effects on a cell or organism. 


6.6 Translation 


Translation is the process by which the genetic information carried in an 
mRNA is translated into a polypeptide chain. Each amino acid in the 
polypeptide is encoded by a triplet codon. 


= How many different codons are there? 


© 64. There are four different types of nucleotides (A, G, C and U/T) which 
can exist in 4° different combinations of three. 


As described previously (Section 6.2.2), transfer RNAs (tRNAs) form the link 
between the mRNA sequence and the polypeptide sequence. The terminology 
used to describe specific tRNAs uses a superscript to denote the amino acid 
that is carried by the tRNA. For example, a tRNA carrying methionine is 
denoted tRNAM*. 


The ribosome has a similar structure in both prokaryotes and eukaryotes, and 
consists of a large complex of ribosomal RNAs and proteins (more than 50 
proteins in prokaryotes and more than 80 proteins in eukaryotes) organised 


243 


Generating Diversity 


In some cases, translation 
initiation is cap-independent, and 
depends instead on internal 
sequences in the RNA, but this 
mechanism will not be described 
in detail here. 


244 


into a large and a small subunit (Figure 3.15). As with gene transcription, 
translation can be divided into three stages: initiation, elongation and 
termination of the polypeptide chain. The last two stages are similar in 
prokaryotes and eukaryotes, but initiation is somewhat different, as will be 
described in the next two sections. 


6.6.1 Initiation of translation in prokaryotes 


The individual subunits of ribosomes are present free in the cytoplasm, and 
for translation to be initiated, a complete ribosome must assemble at the 
appropriate place on an mRNA molecule. Attachment of the ribosome to the 
mRNA occurs at the ribosome binding site, which in £. coli has the 
consensus sequence 5' AGGAGGU 3’ (Figure 6.21a). This site lies 3-10 
nucleotides upstream of the translation initiation codon — the first codon in the 
protein-coding open reading frame of the mRNA. The initiation codon is 
almost always 5’ AUG 3’, which codes for the amino acid methionine. 


In prokaryotes, the small (30S) subunit of a ribosome binds first to the 
ribosome binding site on the MRNA, with the help of two translation 
initiation factor proteins called IF1 and IF3 (Figure 6.21b) which also prevent 
the large ribosomal subunit from binding too soon. Next, the initiator tRNA 
carrying methionine (tRNA;M“) is brought to the small subunit of the 
ribosome by another translation initiation factor, IF2, which is bound to a 
molecule of GTP (which is an energy-rich molecule similar to ATP, 

Section 1.2.1) (Figure 6.21c). The large (50S) subunit of the ribosome then 
attaches to the small subunit using the energy stored in the GTP molecule 
bound to IF2 (Figure 6.21d). During this process, the GTP is hydrolysed to 
GDP by breaking the bond to one of its phosphate groups (releasing the 
energy stored in the bond). The IF1, IF2 and IF3 dissociate from the ribosome 
complex and translation initiation is complete. 


6.6.2 Initiation of translation in eukaryotes 


In contrast to prokaryotic mRNA, eukaryotic mRNA does not have a ribosome 
binding site, so the initiation codon has to be located by another mechanism, 
In eukaryotes, a complex including the small (40S) ribosomal subunit, an 
initiator tRNAM“, the eukaryotic translation initiation factor elF2 (bound to 
GTP) is formed independently in the cytoplasm (Figure 6.22a). Initiation of 
translation usually involves the attachment of this complex to the mRNA at 
the 5’ cap structure, a process requiring two more translation initiation factors 
called elF3 and elF4 (Figure 6.22b). The whole complex then scans along the 
mRNA until it locates the first translation initiation codon, 5’ AUG 3’. 
Selection of the correct 5’ AUG 3’ is facilitated by recognition of the 
consensus sequence, 5’ ACCAUGG 3’ containing the initiation codon 
(italicised). This consensus sequence is called the Kozak sequence after 
Marilyn Kozak, the scientist who identified it. 


Chapter 6 The control of gene expression 


ribosome initiation 
binding site codon 
Ba 
@) AGGAGGU<2AUG MRNA 
i 3-10 nucleotides 


O@ 


30S 
rr 
(b) AGGAGGU AUG MRNA 


(c) 


(4) 


5" — . 3" 
mRNA 


5! 


Figure 6.21 Initiation of translation in prokaryotes. (a) and (b) The small (30S) 
ribosomal subunit and the translation initiation factors IF] and IF3 bind to the 
ribosome binding site. (c) The initiator tRNA is then brought in by IF2 and binds 
to the start codon (AUG). (d) The GTP molecule (red dot) bound to IF2 is 
hydrolysed to GDP (grey dot) and inorganic phosphate to provide energy for 
binding of the large (50S) ribosomal subunit, and the translation initiation factors 
dissociate from the complex, which completes initiation. 


245 


Generating Diversity 


Once the initiation codon is recognised the GTP bound to elF2 is hydrolysed 
to provide energy for the attachment of the large (60S) ribosomal subunit, and 
at the same time the eIF2, eIF3 and elF4 factors all dissociate from the small 
ribosomal subunit (Figure 6.22c). Initiation is now complete and the ribosome 
commences polypeptide elongation. 


40S 40S ileus 


initiation 
codon 
5 
(a) (b) 


40S 


—EEE ll EE 
ACCAUGG RNA 


Ht 
UAC —- 
60S QO 


elF2-GDP 


(c) 


Figure 6.22 Initiation of translation in eukaryotes. (a) and (b) A preformed 
complex comprising the small (40S) ribosomal subunit, the translation initiation 
factor eIF2 and the initiator tRNA attaches to the 5’ cap structure of the mRNA 
(facilitated by elF3 and elF4) and scans along to locate the start codon (AUG). 
(c) The GTP molecule (red dot) bound to eIF2 is hydrolysed to GDP (grey dot) 
and inorganic phosphate to provide energy for binding of the large (60S) 
ribosomal subunit, and the translation initiation factors dissociate from the 
complex which completes initiation. 


6.6.3 Elongation of the polypeptide 


Polypeptide chain elongation is similar in prokaryotes and eukaryotes and 
involves a great many protein factors, so the full details are not explored here. 
The large ribosomal subunit attached to the mRNA has two binding sites for 


246 


Chapter 6 The control of gene expression 


tRNAs: a P (polypeptide) site and an A (acceptor) site. The P site is shown 
already occupied by tRNA;M“ in Figure 6.23a. A translational elongation 
factor (called EF-Tu in prokaryotes and eEF1 in eukaryotes) bound to a GTP 
molecule brings in a second tRNA molecule with the appropriate anticodon to 
the A site (Figure 6.23b). If the codon—anticodon pairing is correct, the 
elongation factor hydrolyses GTP to GDP and inorganic phosphate, creating a 
conformational change in the ribosome that causes the incoming tRNA to 
fully enter the A site. The ribosome then catalyses the formation of a peptide 
bond between the methionine at the P site, and the second amino acid at the 
A site (phenylalanine in the example shown in Figure 6.23c). 


(d) 
Figure 6.23 Elongation of translation (see description in the text). 


The next step is translocation, during which three things happen 

(Figure 6.23d): 

i The mRNA moves along three nucleotides so that the tRNA?"® originally 
in the A site moves to the P site, and the next codon in the mRNA 
(CUG in the example shown) now enters the A site and a new tRNA 
(tRNA“) occupies it. 

ii The amino acid bound to tRNA; is released. 


247 


Generating Diversity 


248 


iii The tRNA,M“ leaves the ribosome. 


The translocation step is facilitated by another type of elongation factor 
(called EF-G in prokaryotes and eEF2 in eukaryotes) which also carries a GTP 
molecule. The elongation factor hydrolyses GTP causing a conformational 
change in the ribosome that triggers translocation. A new peptide bond is 
formed with the next amino acid. Repetition of this cycle elongates the 
polypeptide chain and synthesis continues towards the carboxy terminus 
(C-terminus) until a termination codon is reached. 


Messsenger RNAs that are translated at free ribosomes in the cytoplasm are 
mostly found in polyribosomes, often abbreviated to polysomes. These are 
clusters of ribosomes that are all bound to a single mRNA molecule. As soon 
as one ribosome has moved far enough away from ribosome binding site, 
another ribosome is able to bind and initiate translation, so that many 
ribosomes can be spaced along the mRNA, each synthesising a polypeptide 
chain (Figure 3.16). This multiple initiation from one transcript means that 
new protein molecules can be produced very quickly. 


= In eukaryotes, proteins destined for lysosomes, or for the cell membrane 
or for export from the cell are synthesised on ribosomes attached to the 
rough endoplasmic reticulum (RER). How are these proteins directed to 
the RER? 


These proteins contain a RER localisation signal sequence in the amino 
terminus of the translated protein, so while their translation is initiated on 
free ribosomes in the cytoplasm, as soon as the signal sequence is 
recognised (by the signal recognition particle) the whole complex of 
ribosome, mRNA and partially translated polypeptide is relocated to the 
RER to complete translation. 


o 


6.6.4 Termination of translation 


When a termination codon is reached, a release factor (RF) protein enters the 
A site of the ribosome instead of a tRNA. The RF recognises one of the three 
termination or stop codons, UAA, UAG or UGA, and ends elongation, 
releasing the completed polypeptide from the tRNA in the P site. The 
ribosomal subunits are released back into the cytoplasmic pool. 


6.6.5 Translational and post-translational control of 
gene expression 


As you have learnt, mechanisms that control the amount of mRNA produced 
by transcription are crucial for controlling gene expression. If, however, a 
particular mRNA is present but there is no need for the cell to make the 
protein, it is wasteful and may indeed be harmful for the mRNA to be 
translated. There are several mechanisms that regulate the initiation of 
translation of some mRNAs. For example, the recruitment of the small 
ribosomal subunit can be inhibited by secondary structure in the mRNA (loops 
and hairpins formed by complementary base pairing), or by specific RNA 
binding proteins. The control of the production of the iron storage protein 
ferritin in certain mammalian cells is a good example of the latter mechanism. 


Chapter 6 The control of gene expression 


If the iron level in the cell is low, ferritin is not required and a translational 
repressor protein binds to the ferritin mRNA and prevents its translation by 
blocking its attachment to the ribosome. When iron levels rise, the free iron 
binds to the repressor and alters its conformation, causing it to detach from 
the mRNA. Translation of ferritin protein then proceeds. 


Translated proteins that are incomplete or misfolded may be disruptive or even 
toxic and are rapidly degraded in the cell by a very large protein complex 
known as the proteasome. Most proteins are eventually degraded by the 
proteasome when they are no longer needed. The half-life of a protein largely 
depends on its N-terminal amino acid, and whether it possesses certain 
sequences: for example, regions rich in the amino acids Pro (P), Glu (E), Ser 
(S) and Thr (T) (called PEST sequences) mark the protein for rapid 
degradation. The balance between protein synthesis and protein degradation 
(protein turnover) contributes to the regulation of protein levels and ensures 
that old, damaged proteins don’t accumulate in the cell. 


Translated polypeptides, particularly in eukaryotes, undergo a whole range of 
regulated post-translational modifications. Many polypeptides are not active 
when they are first formed but require additional folding, binding to other 
polypeptides, enzymatic cleavage, or modifications, such as phosphorylation 
or glycosylation of particular amino acids, before they become fully active. 
There is not space to discuss post-translational protein modification and 
protein turnover here, but you will come back to them in Chapter 1 of 

Book 2, and you will meet some examples throughout the module. 


Summary of Section 6.6 


¢ The three types of RNA molecule - mRNA, tRNA and rRNA ~ are all 
essential for protein translation. mRNA carries the genetic code for 
synthesising a polypeptide, and tRNA is the key for deciphering the code; 
the amino acid specified by the code is brought to ribosome by tRNA and 
added to the growing polypeptide chain. rRNA is an essential component 
of the ribosome. 


¢ Protein translation occurs in three stages: initiation, elongation and 
termination. All three stages take place at the ribosome, and are somewhat 
similar in prokaryotes and eukaryotes. A number of highly conserved 
translation factors facilitate these processes. 


¢ In prokaryotes, translation initiation involves binding of the small 
ribosomal subunit to the ribosome binding site in the mRNA. This is 
followed by binding of the initiator tRNA (tRNAM) to the initiation 
codon (AUG) followed by binding of the large ribosomal subunit. 

¢ In eukaryotic translation initiation, a preformed complex of the small 
ribosomal subunit, tR2NA;M and specific translation initiation factors binds 
to the 5’ cap structure of the mRNA and scans along the mRNA to locate 
the initiation codon. Initiation is completed by binding of the large 
ribosomal subunit. 

¢ During elongation, mRNA moves through the ribosome and as each codon 
enters, a tRNA with the correct anti-codon occupies the A (acceptor) site 


249 


Generating Diversity 


250 


of the ribosome. The ribosome forms a peptide bond between the amino 
acid carried by the tRNA in the A site and the amino acid already present 
in the P (polypeptide) site. The mRNA moves along and a new tRNA, 
with its attached amino acid, enters the A site (translocation). 

¢ Translation terminates at a termination codon (UAA, UAG or UGA) when 
release factor (RF) protein enters the A site of the ribosome instead of a 
tRNA. 

* Most eukaryotic proteins are modified post-translationally in a number of 
ways (including phosphorylation and glycosylation) before they become 
fully functional. 

* Incompletely translated or misfolded proteins are degraded by the 
proteasome. 


6.7 Final word 


Many years of painstaking research and the development of sensitive 
techniques for detecting and quantifying RNAs and proteins have begun to 
reveal many of the complex processes that ensure cells can perform their 
different functions at the appropriate place and time. The level of expression 
of genes in prokaryotes and eukaryotes is controlled largely by the regulation 
of their transcription, but the processing and stability of eukaryotic mRNAs 
also contribute to the complexity of eukaryotic gene expression patterns. The 
translation of mRNA to synthesise proteins can also be controlled, and the 
protein itself may be modified to alter its activity. Although in this chapter the 
focus has been on the control of individual genes, it should be emphasised 
that in a cellular context it is the coordinated expression of the whole genome 
that determines the cell’s development and function. 


The next book in the module addresses ‘the working cell’, and considers in 
more detail the structure and function of cellular proteins; cell membranes and 
their role in transport of substances into and out of cells; how cells capture 
and use energy; how they transmit, receive and respond to signals from their 
environment; and finally, how they carry out a variety of different types 

of movement. 


6.8 Learning outcomes 


6.1 Describe the stages of gene expression in prokaryotic and eukaryotic 
cells. 


6.2 Describe the important features of the nucleic acids and protein 
molecules involved in transcription and translation in prokaryotic and 
eukaryotic cells. 


6.3 Distinguish between the mechanisms controlling the efficiency of 
transcription and translation in prokaryotic and eukaryotic cells. 


6.4 Explain how post-transcriptional processing differs between prokaryotes 
and eukaryotes. 


6.5 Outline some of the techniques used to study gene expression. 


Acknowledgements 


Acknowledgements 


Grateful acknowledgement is made to the following sources: 


Figure 1.8: Adapted from Futuyma, D.J. (2009) Evolution, 2nd edn, Sinauer 
Associates Inc.; Figure 1.11a: © Jarrodl — Dreamstime.com; 

Figure 1.11b: Mark Grimson, Texas Tech University; Figure 1.1 1c: John 
Anderson/iStock photo.com; Figure 2.3: Source of image unknown; 

Figure 2.4: Dr Jack Bostrack, Visuals Unlimited/Science Photo Library; 
Figures 2.5, 2.7c, 2.8, 2.18, 2.19, 2.20a, 2.21a, 2.22b, c, 2.23a and 

3.24: Courtesy of Jill Saffrey, Department of Life, Health and Chemical 
Sciences, Open University; Figure 2.7a: Moll, R. et al. (2008) ‘The human 
keratins: biology and pathology’, Histochemistry and Cell Biology, vol. 129, 
with kind permission from Springer Science and Business Media, Springer 
Verlag; Figure 2.7b: Courtesy of Susan Van Noorden; Figure 2.9a: Eye of 
Science/Science Photo Library; Figure 2.9b: Steve Gschmeissner/Science 
Photo Library; Figure 2.9c: Dr Richard Kessel and Dr Gene Shih, Visuals 
Unlimited/ Science Photo Library; Figure 2.9d: Martin Oeggerli/Science Photo 
Library; Figure 2.10a: Science VU, Visuals Unlimited/Science Photo Library; 
Figure 2.10b: CNRI/Science Photo Library; Figure 2.11: Prof. Heribert 
Cypionka/Oldenburg University; Figure 2.12b: Astrid and Hanns-Frieder 
Michler/Science Photo Library; Figures 2.13a and b: Taken from the book 
Freshwater Algae: Their microscopic world explored, Canter-Lund and Lund 
(1995), Biopress: Figure 2.13c: Wim Van Egmond, Visuals Unlimited/Science 
Photo Library; Figure 2.13d: Cristian Solari, University of Arizona; 

Figure 2.14b: Biospot Associates/Science Photo Library; 

Figure 2.15b: Thomas Deerinck, NCMIR/Science Photo Library; 

Figure 2.17b: Power and Syred/Science Photo Library; Figure 3.2b: Dr Gopal 
Murti/Science Photo Library; Figure 3.3: Courtesy of Heather Davies, 
Department of Life, Health and Chemical Sciences, Open University; 

Figure 3.10: Dr Torsten Wittmann/Science Photo Library; Figure 3.11a, 3.17a, 
3.19a and 3.22a: Courtesy of Professor Michael Stewart, Department of Life, 
Health and Chemical Sciences, Open University; Figure 3.12: Adapted from 
Purves, W.K. et al. (1998) Life, the Science of Biology, 5th edn, Sinauer 
Associates Inc; Figure 3.14: Dr Richard Kessel and Dr Gene Shih, Visuals 
Unlimited/Science Photo Library; Figure 3.16: Photo printed with permission 
from O. L. Miller, B. A. Hamkalo and C. A. Thomas Jr; Figure 3.21a: CNRI/ 
Science Photo Library; Figure 3.21b: Dr Don Faweett/Science Photo Library; 
Figure 3.22c: Dr Gopal Murti/Science Photo Library; Figure 4.1a: National 
Institutes of Health, part of the United States Department of Health and 
Human Services; Figure 4.1c: Keith Bradnam, used under Creative Commons 
Attribution-ShareAlike License; Figure 4.1d: Nigel Cattlin/Getty Images; 
Figure 4.le: Janeff/istockphoto.com; Figure 4.1f: Rama, licensed under the 
Creative Commons Attribution-Share Alike 2.0 France license; 

Figure 4.6b: Courtesy of the John Innes Foundation; Figure 4.12b: The 
American Philosophical Society; Figure 4.13: Dr Jeremy Burgess/Science 
Photo Library; Figure 5.5: HHMI; Figures 5.8, 5.10 and 5.30: Alberts, B. 

et al. (2002) Molecular Biology of the Cell, 4th edn, Garland Science, a 
member of the Taylor and Francis Group; Figure 5.21: Shanel, used under a 


251 


Generating Diversity 


252 


Creative Commons Attribution-Share Alike 3.0 Unported license; 

Figure 5.22: Dr L. Caro/Science Photo Library; Figure 5.25: Abizar, used 
under a Creative Commons Attribution-Share Alike 3.0 Unported license; 
Figure 5.28b: Arturo Londono, ISM/Science Photo Library; Figure 5.31: 

Li, W.H. and Graur, D. (1991) Fundamentals of Molecular Evolution, 
copyright © Sinauer Associates Inc., Publishers; Figure 5.32: Hingorani, A.D. 
(2010) ‘Science, medicine and the future: translating genomics into improved 
healthcare’, British Medical Journal, vol. 341, 13 November 2010, BMJ 
Publishing Group Ltd; Figure 6.6: Lodish, H., Baltimore, D., Berk, A., 
Zipursky, S.L., Matsudaria, P. and Darnell, J., Molecular Cell Biology © 1995 
by W.H. Freeman and Company; Figure 6.8: Alberts, B. et al. (1998) 
Essential Cell Biology: An Introduction to the Molecular Biology of the Cell, 
Taylor & Francis Inc.; Figure 6.13 photograph: Josh Plotkin and the 
University of Pennsylvania. 


A number of the figures in this publication were created using data from the 
Protein Databank, http://www.rcsb.org. 


Every effort has been made to contact copyright holders. If any have been 
inadvertently overlooked the publishers will be pleased to make the necessary 
arrangements at the first opportunity. 


Module team 


Module team 


Module team chair and author 
Robert Saunders 


Curriculum managers 
Viki Burnage 

Tracy Finnegan 

Curriculum assistants 
Becky Efthimiou 

Dawn Partner 

Book | chair and author 
Carol Midgley 


Book 2 chair and author 
Jane Loughlin 


Book 3 chair and author 
Colin Walker 


Authors 


Jill Saffrey 
Kerry Murphy 
Diane Butler 


Critical readers 


Christine Gardener 
Marion McKinnon 


External assessor 
Richard Tunwell 


Production team 


Greg Black (Media Developer) 

Martin Chiverton (LTS Producer) 

Corinne Cole (Media Assistant) 

Roger Courthold (Media Developer) 

Ruth Drage (Media Project Manager) 

Phil Gauron (Producer, Clear Focus Productions) 
Sara Hack (Media Developer) 

Chris Hough (Media Developer) 

Martin Keeling (Media Assistant, Rights) 

Bina Sharma (Media Developer) 


253 


Generating Diversity 


254 


Andy Sutton (Media Developer) 
Peter Twomey (Media Developer) 


Indexer 


Jane Henley 


Other contributors 


Margaret Swithenby 
Sarat Vatsavayai 


Index 


Index entries and page numbers in 
bold refer to glossary terms. Page 
numbers in italics refer to entries in 
figures or tables. 


1000 Genomes Project 195 


A 


A (acceptor) site 247 
abasic site 158, 165 
acetic acid 8 
actin 79, 8/, 99, 100 
actinobacteria 19, 43 
activator site (AS) 216, 2/8, 219 
acute macular degeneration (AMD) 
242 
adenine (A) 10, /46, 147, /48, 
157, 158 
adenosine 5'-triphosphate (ATP) 
14, 15 
from aerobic respiration 16 
generation 94 
adherens junctions 101 
adipocytes 58 
adrenal cortex 57 
adrenal gland 57, 58 
adrenal medulla 57 
aerobic respiration 15-16 
agar plates 41 
agarose gel 230, 23/ 
alanine 8 
albinism 135, /36 
Alcian blue 60 
algae 47, 48, 49 
reproduction 115 
alleles 12, 116 
linkage 125-6 
from recombination /28 
wild type and mutant 131-2, 
168, 181 
see also dominant alleles; 
genes; recessive alleles 
allosteric regulation 217 


alternation of generations 115 
alternative splicing 238, 239 
amino acids 5 

codons 164, 202, 205 

polymerisation 9 

primordial 6, 7, 8 
ammonia & 
amoeba 22, 23, 47, 48 
amoeboid movement 48 
amorphs 168 
ampicillin 174, 175, 176 
anabolic reactions 14 
anaerobic 15 
anaphase ///, 112, //3, 114 
anchoring junctions 101 
angiosperms 53 

sexual life cycle 115, 118 
animal cells 56, 57 

cell membrane 78 

culture 41-2 

nucleus 82 

schematic diagram 2/, 68 

sexual life cycle 1/5 

size 32, 33 

see also mammals 
Animalia 18 


antibiotic-resistant gene 172, 176, 


183 
antibodies 37, 225 
immunolabelling 38-9, 40, 
69-71 
anticodon 205-6, 207 
antigen 38, 39 
antiparallel 147 
antisense RNA 242 
antisense strand 203, 204 
apical surface 60, 91, 92 
apoptosis 97 
Arabidopsis thaliana 107, 108 
Archaea 17, 18, 19 
aspartic acid 8 
Asterionella formosa 49 


Index 


ATP see adenosine 
5’-triphosphate (ATP) 

attenuation 219 

autophagy 92 

autoradiography 230 

autosomes 109, 110, 131 

mutation 134, /35, 136, 137 
autotrophs 15 
auxotrophs 181 


B-DNA 144 
Bacteria 17, 18, 19 
bacteria 
actinobacteria 19, 43 
cell culture 41 
cell membrane 76 
cell walls 75, 76 
colonies 22, 23 
cyanobacteria 43, 76 
endosymbiotic theory 21 
Gram-negative and positive 36, 
45, 15, 76, 178 
mismatch repair 156 
morphology 43, 44, 45 
operon 216-19 
schematic diagram 20 
size 45, 46 
transcription 211-14, 219, 220 
transfer RNA 206 
see also Escherichia coli 
bacteriophage 141, 170, 173, 183 
“ball and stick’ model 144, /45 
base excision repair (BER) 157-8, 
165 
base pairing 146-7, 148 
complementary 153, 205, 206 
mismatch repair 156 
in RNA 163, 164 
wobble base pairs 206, 207 
see also codon 
basolateral surfaces 60, 91 
Bdellovibrio 45 


255 


Generating Diversity 


bidirectional replication 177, 178 
bilayers, phospholipids /3, 20, 78 
binary fission 110, 180 
biotechnology | 
blood cells 57, 58, 59, 60 
see also leukocytes 
boxes 
1.1 Chemical evolution of 
organic molecules 6-8 
1.2 Variation, natural selection 
and evolution 11-12 
1.3 The last universal common 
ancestor (LUCA) 16-17 
1.4 Taxonomy and the tree of 
life 17-18 
2.1 Units for measuring the size 
of cells 31-2 
2.2 How a light microscope 
works 33, 34 
2.3 Histochemistry: the use of 
chemical stains to identify cells 
and some cell components 36-7 
2.4 Immunolabelling: using 
antibodies to identify molecules 
in cells and tissues 38-9, 40 
3.1 Cell fractionation 72-4 
4.1 Model organisms in genetic 
research 106-8 
4.2 A note on genetic 
nomenclature 119-20 
5.1 The polymerase chain 
reaction 152-4, 155 
5.2 Gene cloning 172-7 
5.3 DNA sequencing 191-4 
6.1 Reporter genes for analysis 
of gene expression 226 
6.2 Methods for studying gene 
expression 230-5 
6.3 Technologies using antisense 
RNA, miRNA and siRNA 
242-3 
bryophytes 53 


(o 


C-terminus 207 

C-value paradox 184 

cadherins 101 

Caenorhabditis elegans 106-7, 
108, 185 


256 


calcium 97, 183 
cancer 
gene expression 234, 235 
gene therapy 242 
and mutation 165 
and telomerase 190 
‘cap-snatching’ 238 
capping 209, 237-8 
capping protein 80-1 
capsid 183, 190 
capsule 20 
catabolic reactions 14, 15 
catabolite activator protein (CAP) 
218-19 
catalase 93 
Caulobacter crescentus 44 
Cell: The spark of life 3, 6, 11, 14, 
25 
cell biology, technical/medical uses 
1 
cell communication 29 
cell cortex 80 
cell culture 39, 41-3, 95 
cell cycle 11] 
cell diversity 27-30 
eukaryotic 46-64 
prokaryotic 43, 44, 45-6 
cell division 23, 64 
eukaryotic 110-14 
prokaryotic 110 
cell fractionation 71-2, 73, 74, 93 
cell junctions 100-2 
cell lines 39, 43 
cell membrane 13, 14, 20, 68 
bacteria 76 
eukaryotic 78 
cell wall 20, 53, 68 
bacteria 75, 76 
eukaryotic 77 
cells 
common properties 27 
evolution 16-19 
flow of information 162 
major tissue types 56, 58 
microscopy 30-40 
origins and characteristics 3-25 
protein secretion from 91-2 
size, units for measuring 31—2 
subcellular organisation 67—74 


see also different species of 
cells; different types of cells 
cellulose 53 
centimorgan 130 
central dogma 202 
centrifugal force 72 
centrifugation 72, 73, 74, 93 
centromere 188 
centrosome 68, 80 
chain termination sequencing 192, 
193, 194 
chitin 50 
Chlamydomonas 48, 49 
Chlorophyta 48, 49, 50 
chloroplasts 20, 68, 95-6 
genome 178 
choanocytes 22 
chromatids ///, 112, //3, 114 
chromatin 82, 83, 84, 169, 186, 
188, 228 
chromatin-remodelling proteins 229 
chromogenic substrate 177 
chromosomes 20 
condensation 83, 84 
duplication 197-8 
eukaryotic 108-10, 184-90 
independent assortment of genes 
on 123-5 
meiosis 112-14 
mitosis 110-12 
nucleolus and 85 
rearrangements 169, 197 
see also homologous 
chromosomes; sex 
chromosomes 
cisternae 90, 9/ 
class 17, 18 
cloning vectors 173, 174, 175, 176 
clotting factors 136, 137 
coding RNAs 204 
codominance 118 
codon 163, 164, 166, 167, 202 
mRNA 164, 205-7, 208 
prokaryotic 170 
in translation 243, 244, 245, 246 
coenocytic 51 
coevolution 52 
cohesive ‘sticky’ ends 173, /74, 
175 
collagen 99, /00 


colony 28 
colony-forming behaviour 22, 
23, 41 
protists 48, 49, 50 
combinatorial control 228 
competent cells 174, 183 
complementary base pairing 153, 
205, 206 
complementary DNA (cDNA) 
232, 233, 234, 235 
complementary strands 147, 153, 
203 
compound microscope 30, 3/, 34 
condenser lens 33, 34 
confocal microscope 39, 40 
conjugation 45, 172, 180-2 
connective tissue 58, 59, 60, 62, 63 
extracellular matrix 99 
consensus sequences 212, 222 
constitutive secretion 92 
constitutive transcription factors 
224, 225 
continuous variation 139-40 
contractile muscle 61 
copy number variation (CNV) 
195 
core polymerase 211 
core promoter 213, 222, 223, 225 
coronary heart disease 140 
cortisol 228 
cotransformation 183 
cotranslational localisation 88, 89 
covalent bonds 4 
Crick, Francis 144, 147, 202 
cristae 94, 95 
crossing over 113 
see also recombination 
cryostat 35 
crystal violet 36 
cuticle 55 
cyanobacteria 43, 76 
cyclic adenosine monophosphate 
(cAMP) 217-19 
cytokinesis 111, 112, //3, 114 
cytoplasm 19 
cytosine (C) 10, 146, 147, 148 
deamination 1/57, 158 
cytoskeleton 47, 68, 78-82 
cytosol 20, 37 


D 


Danio rerio 107 
daughter cells 

fungi 52 

mitosis 112, 1/3 
ddNTPs (dideoxynucleoside 

triphosphates) 192, 193 

deamination, cytosine /57, 158 
decomposition 52 
degeneracy 207 
deletions, chromosome 197 
denaturation 153, 154, /55 
dendrites 62 
density 72 


density gradient centrifugation 73, 


74 
deoxyribonucleic acid (DNA) 9, 
105-6, 143 
arrangement in nucleus 83 
binding to histones 84 
cloning 172-7 
polypeptides from 202 
prokaryotic 75 
protein synthesis from 162-4 
structure 10 
structure and role 144—7 
transcription /0, 82, 203-5 
UV damage 158-60 
deoxyribose sugar 145, /46, 147, 
148, 158, 203 
depurination 158 
dermal tissue 53, 54, 55 
desmosomes 101 
detoxification 87 
Dicer 240, 24/, 242 
Dictyostelium discoideum 22, 23 
differential sedimentation 73, 74 
differentiation 22, 28, 201 
diploid 108-9, 112, //3 
alleles 116 
sexual life cycle 1/5 
discontinuous variation 118, 
139-40 
disease-related genes 198, 199 
distal promoter region 225 
DNA see complementary DNA 
(cDNA); deoxyribonucleic acid 
(DNA); mitochondrial DNA; 
regulatory DNA elements 


Index 


DNA binding domain 214 
DNA binding motifs 214, 2/5 
DNA chip 234, 235 
DNA glycosylases 158 
DNA ligase 151, 156, 158, 159, 
160 
in gene cloning 173, 174, 175 
DNA microarray 234, 235 
DNA polymerases 149, /50, 151, 
152, 155, 158 
in DNA sequencing 192, 193 
Taq 153-4, 233 
see also polymerase chain 
reaction (PCR) 
DNA repair 155-62 
DNA replication /0, 11, 25, 82, 
110, 147-54, 155, 192 
error correction 156 
error rate 151, 195 
eukaryotic 152, 188-90 
prokaryotic 152, 177, 178 
DNA sequences 130, 147, 153, 
162, 191-4 
repeating 186, /87 
single nucleotide changes /67 
transcription factors binding to 
214-16 
DNA topoisomerase 149, /50, 171 
dNTPs (deoxyribonucleotide 
triphosphates) 147, 148, 153, 192, 
193 
domain 17, 18, /9 
DNA binding 214 
dominant alleles 118, //9, 121-3, 
124 
family pedigrees and 134-6 
mutations and 168-9 
dormant cells 29 
double helix 144, 145, 147 
‘kink’ 158, /59 
double immunolabelling 38, 39, 40 
double-strand breaks, repair 160-2 
double-stranded RNAs (dsRNAs) 
242 
Drosha 240, 24/ 
Drosophila melanogaster 107, 108, 
119-20 
genetic maps 129, 130-1 
genome size 185 
karyotypes 109, 110 


257 


Generating Diversity 


sex-linked genes 131-3 
transposons 187 
white gene 168, 169 


Duchenne muscular dystrophy 136, 


185 
duplication 
chromosomes and genomes 
197-8 
genes 195-6, 197 
dystrophin 185 


E 


Earth, origins of life on 3-13 
elastin 99 
electromagnetic radiation 34 
electron micrographs 
chloroplast 96 
conjugation /80 
endoplasmic reticulum 87 
Golgi apparatus 9/ 
lysosomes and peroxisomes 93 
mitochondria 95 
nucleus 82 
ribosomes 86 
see also scanning electron 
micrographs; transmission 
electron micrograph 
electron microscopy 35, 67-71 
structures visible with 33 
electroporation 183 
elongation 153-4, 155, 207, 210, 
236 
polypeptide 246-8 
endocrine cells 58 
endocytosis 47, 92 
endoplasmic reticulum 68, 82, 
87-90 
endosomes 92 
endospores 75 
endosymbiotic theory 20-1, 32, 
138 
endothelial cells 60 
enhancers 225 
enzymes 5 
digestive 92 
in DNA replication 11 
in immunolabelling 38 
repair 151, 156, 159 


258 


see also groups of enzymes; 
specific enzymes 
eosin 37, 57 
epidermal cells 54, 55 
epidermis 55 
episomes 171 
see also plasmids 
epithelial cells 58, 59-60 
cell junctions 100, /0/ 
small intestine 80, 91-2 
Escherichia coli 17, 44, 45, 141 
conjugation 180, /8/ 
in DNA cloning 174, 175, 176 
gene structure 170 
genetic mapping /82 
genome 148 
lac operon 216-17, 2/8 
model organism /07, 108 
O157:H7 172 
RNA polymerase 211, 2/2 
synthetic ribosomes 25 
euchromatin 82, 83, 228 
Eudorina elegans 49 
Eukarya 17, 18, 19 
eukaryotes 18, 20-1 
DNA replication 152, 188-90 


genes and chromosomes 184-90 


genome 186-8 
mismatch repair 156 
mRNA processing 209-10 
organelle genome 177-9 
protein localisation 89 
ribosomes 86 
sexual life cycles 114-16 
transcription /85, 221-37 
factors regulating 224-6 
initiation 222-4 
initiation coordination 227-8 
post-transcriptional control 
237-43 
vs prokaryotes 221 
transfer RNA 206 
translation 244, 246 
eukaryotic cells 18 
chromosomes 108-10, 184-90 
diversity 46-64 
division 110-14 
histochemical staining 37 
key functions 77 
main features 27 


organisation 28, 77-97 
size 32 


see also animal cells; plant cells 


evolution 12 

cells 16-19 

chemical 6-8 

coevolution 52 

genome 194-9 
exocytosis 92 
exons 185-6, /87, 209 

mRNA splicing 238, 239 
exonuclease 156 
extracellular matrix 99-100 
eye colour /3/, 132, 133 
eyepiece lens 33, 34 


F 


F plasmids 171, 180, /8/, 182 

F, progeny 117, 118, 121, /22, 
126-7, 129 

F, progeny 121, /22, 123, 127 

fats see lipids 

fatty acids 5 

Felis catus 17, 18 

female karyotypes /09, 110, 131 

ferritin 248-9 

fibrils 100 


fibroblasts 58, 62, 63, 8/, 99, 100, 


238-9 
fibronectin 99, 100, 238-9 
filaments see intermediate 
filaments; microfilaments 
fimbriae 45 
fitness 12 
flagella/flagellum 45, 48 


fluorescence micrograph 62, 8/, 95 


fluorescence microscope 38 
fluorophore 38, 39, 193 
formaldehyde 8 
formic acid 8 
frameshift mutations 166, 167 
Franklin, Rosalind 144 
fruiting body 22, 23, 50 
Fungi 18 
fungi 50-3 

sexual life cycle 1/5 


G 
G1 (gap 1) phase 111 


G2 (gap 2) phase 111 
gain of function mutations 169 
B-galactosidase 171, 174, 176, 216, 
217, 219 
reporter gene 226 
B-galactoside permease 216, 217, 
219 
gametes 108-9, 112, 1/3 
fertilisation 118, 79, 12/, 122 
genetic diversity 114 
genotypes /24, 125 
from linkage and recombination 
126-7, 128 
sexual life cycle //5 
see also sex chromosomes 
gametophyte 115 
ganglia 61 
gap junctions 61, 101 
gas vesicles 75 
gel electrophoresis 191-2, /93, 
230, 231 
gene cloning 153, 172-7 
gene expression 23, 63, 201-11 
coordinated 230-5 
first stage 203-5 
post-transcriptional control 
237-43 
reporter gene analysis 226 
second stage 205—7, 208 
translational/post-translational 
control 248-9 
gene family 186, 196, /97 
gene regulation 201 
eukaryotic 221-37 
prokaryotic 211-21 
gene therapy 242 
genera 17 
general transcription factors 
223-4, 227 
genes 9, 63, 106 
disease-related 198, 199 
duplication 195-6, 197 
encoding 143 
eukaryotic 184-90 
homologous 195 
housekeeping 224 
human globin genes 196, 197 
independent assortment on 
chromosomes 123—5 
multiple-copy 186 


mutation 119-20, 164-70 
prokaryotic 170-1 
pseudogene 196 
reporter 226 
sex-linked 131-3 
transcription see transcription 
translation see translation 
see also alleles 
genetic carriers 136, 137 
genetic code 163, 164, 179, 205 
degeneracy 207 
genetic information 162-4 
genetic maps 129-31 
recombination 181, /82 
transformation 183 
genetic variation 11, 114, 115 
copy number 195 
discontinuous 118 
discontinuous versus continuous 
139-40 
Pisum sativum 117 
genetics 105 
Mendelian 116-29 
model organisms 106-8 
nomenclature 119-20 
prokaryotes 180-3 
genome 9, 106 
duplication 197-8 
Escherichia coli 148 
eukaryotic 186-8 
mouse 165 
prokaryotic 171-7 
prokaryotic vs eukaryotic 
organelle 177-9 
sequencing 191-4 
size 108, 184, 185 
synthesis 24 
variation and evolution 194-9 
viral 190 
see also human genome 
genome-wide association studies 
(GWAS) 198-9 
genotype 106, 121-3 
autosomal dominant mutation 
135 
autosomal recessive mutation 
135, 136, 137 
independent assortment of genes 
123-5 


Index 


linkage and recombination 127, 
129 
nomenclature 120 
pure-breeding 118, 1/9 
genus 17, /8 
germ cells 58, 108, 112, 189 
Giardia lamblia 47 
globins 196, 197 
glucocorticoid receptor 228 
glucose 217, 2/8, 219 
aerobic respiration 16 
glutaraldehyde 69 
glycerol 5 
glycine 8 
glycogen granules 75 
glycolysis 15 
glycoproteins 5, 78, 89 
glycosaminoglycans (GAGs) 99 
glycosylation 89 
gold particles 69, 7/ 
Golgi apparatus 68, 70, 88, 90-1 
Gonium 48, 49 
Gram-negative bacteria 36, 45, 75, 
76 
Gram-positive bacteria 36, 75, 76, 
178 
Gram stain 36, 75 
grana/granum 95, 96 
Great Oxygenation Event 15, 17 
green fluorescent protein (GFP) 
gene 226 
ground tissue 53-4 
growth factors 221, 242 
guanine (G) 10, /46, 147, 148, 158 
guanosine diphosphate (GDP) 244, 
245, 246, 247 
guanosine triphosphate (GTP) 237, 
244, 245, 246, 247-8 
guanylyl transferase 237 
guard cells 55 


H 


haemoglobin 196 

haemophilia 133, 136, 137 
haemophilia A /37 

haemophilia B /37 
Haemophilus influenzae 182, 191 
haematoxylin 37, 57 

hairpin loop structure 219, 220 


259 


Generating Diversity 


half-life 204 
haploid 108, 112, 1/3 
sexual life cycle 1/5 
haplotype 198 
helicases 149, /50, 151, 152, 159, 
160 
Rho factor 219 
helix—turn-helix motif 214, 2/5 
hemidesmosomes 101 
heritable traits 117-23 
herpes simplex virus 240 
heterochromatin 82, 83, 169, 188, 
228 
heterogametic 131 
heterosomes 109-10, 110, 131-3 
heterotrophic absorbers 51 
heterotrophs 14, 15 
heterozygous 116, 123-4, /25, 127 
Hfr (high frequency of 
recombination) 180, /8/, 182 
histochemistry 36-7 
histology 35 
histone acetylases 228, 229 
histones 83, 84, 85, 186, 228, 229 
HIV-1 (AIDS) virus 236 
Homo sapiens 17 
homogenisation 72, 73 
homologous chromosomes 109, 
11, 112-13 
recombination /28 
homologous end-joining 161-2 
homologous genes 195 
homozygous 116, 118, 127 
Hooke, Robert 30, 3/ 
horizontal gene transfer 172, 180 
hormones 221 
secretion 91, 92 
housekeeping genes 224 
human genome 148-9 
composition 186, /87 
genetic variation 195 
mitochondrial /79 
size 185 
Human Genome Project 194 
human globin genes 196, 197 
human karyotypes /09, 110, 131 
human pedigree analysis 134-8, 
139 
Huntington’s disease 134, 135 
hydrogen 8 


260 


hydrogen bonds 4, /45, 146 
hydrogen cyanide 8 
hydrogen peroxide 93 
hydrogen sulfide 15 
hydrophilic heads /3, 78 
hydrophobic tails /3, 78 
hypermorphs 169 

hypha(e) 50, 5/ 
hypomorphs 168 


immune system cells 58, 59 
immunocytochemistry 38 
immunoelectron microscopy see 
immunolabelling 
immunohistochemistry 38 
immunolabelling 38-9, 40, 69-71 
in vitro 41 
cell cultures 43 
DNA replication 192 
DNA synthesis 153 
gene cloning 173 
in vivo 41 
independent assortment 114, 
123-5 
inducible transcription factors 224, 
225 
influenza virus 190, 238 
initiation 207, 210 
coordination 227-8, 229 
transcription 211-14, 222-4 
translation 244, 245, 246 
integrins 99, /00, 238 


intermediate filaments 68, 79, 80, 


81-2 
keratin 82 
lamins 85 
intermembrane space 94, 95 
interphase 111, 112 
‘interrupted mating’ experiments 
181, 182 
introns 185-6, /87, 209, 2/0 
mRNA splicing 238, 239 
inversions, chromosome 197 
iodine solution 36 
ions 4 


JK 
junk DNA 106 


karyotype 110 

human and Drosophila 109, 131 
keratin 82 
keratinocyte cell line 39 
kingdom 18 
‘kink,’ double helix 158, 159 
Kozak sequence 244 


L 


lac control region 216 
lac operon 170-1, 216-19 
lac repressor 217, 2/8, 219 
lacA gene 171, 216 
lactose 170-1, 216, 217, 2/8, 219 
lacY gene 171, 216 
lacZ gene 171, 174, 176, 177, 216, 
217 
lagging strand /50, 151, /52, 188, 
189 
lamellae 95, 96 
lamins 85 
last universal common ancestor 
(LUCA) 16-17, 19 
lead 69 
leading strand /50, 151, /52, 188, 
189 
Leishmann’s stain 57 
lenses 33, 34, 35 
leucine zipper motif 214, 2/5 
leukocytes 57, 58, 62, 70 
life 
definition 16 
synthetic 24—S 
tree of 17-18, 19 
light micrographs 
Amoeba proteus 48 
animal tissues 57 
fibroblast cells 8/ 
human culture cells 95 
plant cells 54 
rat small intestine 59, 60, 61, 62 
light microscope 
in cell culture 42 
cells and tissues 35-9, 40 
mode of function and resolution 
33-5 
structures visible with 33 
linkage 125-9 


sex linkage in human pedigrees 
136-8 
sex-linked genes 131-3 
lipids 5, 78 
see also phospholipids 
logarithmic scale 33 
lucerifase 226 
lysate 74 
lysogeny 183 
lysosomes 68, 74, 88, 92-3, 94 


M 


M phase 111 
macromolecules 3, 5, 14 
major groove 144, 145 
male karyotypes 109, 110, 131 
mammals 
cells 58, 59-64 
genome size 184 
Manhattan plot /99 
mediator complex 227 
meiosis 112-14, //5 
recombination /28 
meiosis I 112 
meiosis II 112 
membrane proteins 78 
Mendel, Gregor 105, 116, //7 
Mendelian genetics 116-29 
messenger RNA (mRNA) 162, 
163 
codons /64, 205-7, 208 
DNA microarrays 234, 235 
eukaryotic 185, 209-10 
half-life 204 
modification of the ends of 
237-8 
northern blotting 230, 23/, 232 
prokaryotic 209-10 
RT-PCR 232, 233 
splicing 185, 209, 238-9 
stability 239-40 
transcription and translation /0, 
82, 202 
transcription termination 236 
translation 85, 86, 88, 89, 205-7, 
208, 248 
metabolism 14 
metaphase ///, 112, 1/3, 114, 122, 
124, 128 


meteors 7 
methane 8, 15 
methionine 206, 244 
micelles /3 
microfilaments 68, 79-80, 8/ 
micrometre (mm) 31, 32, 33, 35 
microRNAs (miRNAs) 205, 240-3 
microscopy 30-40 
in cell culture 41, 42, 43 
see also electron microscopy 
microsomes 72 
microtome 35 
microtubule organising centre 
(MTOC) 80 
microtubules 68, 79, 80-1 
microvilli 60, 80, 92 
Miller, Stanley 6-7, 8 
minor groove 144, /45 
miR-1 240 
mismatch repair (MMR) 156 
missense mutation 166, /67 
mitochondria 20, 68, 70, 94-5 
genome 178, /79 
mutation 138-9 
size 32 
mitochondrial DNA 94 
mitochondrial genetic code /79 
mitochondrial matrix 94, 95 
mitosis 110, 111-12, 1/3, 1/5 
mitotic spindle ///, 112 
model organisms 106-8 
molecular lesions 156 
molecules 
biosynthesis 14-16 
macromolecules 3, 5, 14 
organic 4, 5, 6-8 
polar 4 
size 33 
monogenic inheritance 140 
monomers 5 
monosaccharides 5 
Morgan, T.H. 119, 129, 130 
morphology 28 
bacteria 43, 44, 45 
mosses 115 
mouse /07, 108 
genome 165 
mRNA splicing 185, 209, 238-9 
mucus 91, 92 
Mullis, Kary 153 


Index 


multicellular organisms 3 
multicellularity 21-3, 28 
multiple cloning site 1/76 
multiple-copy genes 186 
muscle cells see skeletal muscle; 
smooth muscle cells 
mutation 119-20, 155 
autosomal 134, 135, 136, 137 
consequences of 164-70 
gain of function 169 
loss of function 168 
mitochondrial 138-9 
mycelia/mycelium 51 
Mycoplasma capricolum 24 
Mycoplasma mycoides 24 
myocardial infarction /99 
myofibrils 57 
myoglobin 196, 197 


N 


N-terminus 207 
nanometre (nm) 32, 33, 35 
natural selection 11—12, 105 
negative inducible regulation 217 
Neisseria meningitidis 71 
neomorph 169 
nerve cells 58, 59, 61, 62 
neurodegenerative disease 
Huntington’s disease 134, 135 
non-nuclear inheritance 138, 139 
‘nicks’ 156 
non-coding RNA (ncRNA) 204-5, 
240 
non-homologous end-joining 161 
non-nuclear inheritance 138-9 
non-template strand 204 
non-Watson-Crick structure 206, 
207 
nonsense mutation 166, /67 
northern blotting 230, 23/, 232 
nuclear envelope 82, 85, 209 
nuclear lamina 85 
nuclear localisation sequence 86 
nuclear membrane 20 
nuclear pores 82, 85, 209 
nuclease 150, /60 
nucleic acids 
constituent molecules and 
functions 5 


261 


Generating Diversity 


hereditary information 9-12 
hybridisation 230 
primordial 6, 7 
see also deoxyribonucleic acid 
(DNA); ribonucleic acid 
(RNA) 
nucleoid 19, 75 
nucleoli/nucleolus 68, 70, 82, 83 
chromosome arrangement 85 
nucleosomes 84, 228-9 
nucleotide excision repair (NER) 
158-9, 160 
nucleotides 5, 10, 106, 145, 146 
changes in single 165-6, 167, 
168 
in DNA replication /48, 149, 
151 
oligonucleotides 153 
nucleus 20, 68, 70, 82-5 
DNA arrangement in 83 
histochemical staining 37 
major activities 82 
protists 48 
null mutations 168, 169 


° 


objective lens 33, 34 
Okazaki fragments /50, 151, /52 
oligonucleotides 153 
oligosaccharides 7/ 
open reading frame (ORF) 164, 

166 

prokaryotic 170 
operator (O) sequence 216, 2/8 
operons 170, /7/ 

bacterial 216-19 
order 17, /8 
organ systems 28 
organelles 14 

biogenesis 97 

endosymbiotic theory 20-1, 32 

eukaryotic 27 

genome 177-9 

size 33 

see also specific organelles 
organic molecules 4, 5 

chemical evolution 6-8 
organs 28 
origin of replication 148-9, /50 


262 


cloning vector 176 
prokaryotic 177, 178, 180, /8/ 
osmium tetroxide 69 
outer membrane 75, 76 
oxygen, atmospheric 15 


P 


P element transposon 187 
P (polypeptide) site 247 
palisade cells 53, 54 
parenchyma cells 53, 54 
partial loss of function mutations 
168 
Pediastrum 49 
pedigree analysis 134-8, 139 
pellet 73, 74 
peptides 9 
see also polypeptides 
peptidoglycan 75, 76 
periplasmic space 75, 76 
peristalsis 59, 61 
peroxisomes 68, 74, 93-4 
personalised medicine 194 
PEST sequences 249 
phagocytosis 76 
phase contrast microscopy 41, 42 
phenotype 11, 106, 121-2 
independent assortment of genes 
123, 125 
linkage and recombination 127, 
129 
pure-breeding 118, //9 
phenotypic variation 11—12, 
119-20 
phloem 54 
phosphodiester bond 147, /48, 203 
phospholipids 13, 14, 20, 78 
production 87, 97 
photosynthesis 15, 54, 55, 95 
phylogeny 17, 18 
phylum 17, /8 
physiological solutions 72 
pili/pilus 45, 180, /8/, 182 
Pisum sativum 116, 117, 117-18, 
119, 121-9 
plant cells 53-5 
schematic diagram 68 
sexual life cycle 1/5 
size 32, 33 


whole genome duplication 198 
Plantae 18 
plasma membrane 13, 14, 20 
plasmids 171-7 
E. coli 180, 181, 182 
replication 178 
plasmodesma(ta) 54, 68, 101 
plastids 20 
Pleodorina californica 49 
polar molecules 4 
polarised 60, 61, 62, 92 
poly(A) tail 238 
polyadenylation 209, 237, 238 
polycistronic transcript 170, /7/, 
240, 241 
polygenic inheritance 140 
polymerase chain reaction (PCR) 
152-4, 155, 176 
qPCR 232, 233 
RT-PCR 232, 233 
polymerisation, amino acids 9 
polymers 5 
polypeptides 9 
elongation 246-8 
formation /0, 202 
signal sequence 86 
translated 88, 89 
translation from mRNA 205-7, 
208 
polyribosomes 86, 248 
polysaccharides 5, 76 
polysomes 86, 248 
populations 11-12 
positive inducible regulation 219 
post-transcriptional modification 
209 
post-transcriptional control 237-43 
post-translational control 248-9 
post-translational localisation 89 
post-translational modifications 
209 
preservation, tissues 35, 69 
primary antibody 38, 39, 40, 71] 
primary cells 43 
primary RNA transcript 209 
primase /50, 151, /52 
primer 151 
annealing 153, 154, /55 
programmed cell death 97 
prokaryotes 18, 19-20 


DNA replication 152, 177, 178 
gene structure 170-1 
genetics 180-3 
genome vs eukaryotic organelle 
177-9 
genomes and plasmids 171-7 
mRNA processing 209-10 
ribosomes 75, 86 
transcription 
initiation 211-14 
termination 219, 220 
vs eukaryotes 221 
translation 244, 245 
see also bacteria; Escherichia 
coli 
prokaryotic cells 18 
cytoskeleton 79 
diversity 43, 44, 45-6 
division 110 
main features 27 
organisation 75-7 
size 30 
promoter region /85, 186, 212-13, 
216, 218, 219 
core 213, 222, 223, 225 
prophase ///, 112, 112, //3 
proteasome 249 
protein sorting 86 
protein synthesis 
from DNA 162-4 
at endoplasmic reticulum 87 
at ribosomes 85-6 
protein targeting 86 
proteins 
capping 80-1 
catabolite activator 218-19 
cellular functions 28 
chromatin-remodelling 229 
constituent molecules and 
functions 5, 9 
cytoskeletal 79, 80 
in DNA replication 149, 150 
glycoproteins 5, 78, 89 
localisation in eukaryotes 89 
membrane proteins 78 
release factor 248 
secretion from cells 91-2 
transmembrane 88, 89 
transporter proteins 60, 85 
turnover 249 


see also amino acids; enzymes; 
polypeptides 
proteoglycans 91, 92, 99 
protists 18 
cell diversity 47-50 
protofilaments 79, 82 
pseudogene 196 
pseudopodia 47, 48 
Punnett square 121, 122, 124, 125, 
126; 127,132 933 
pure-breeding 117-18, 1/9, 126 
purines 146, 166 
depurination 158 
pyrimidines 146, 158, 166 


QR 


quantitative PCR (qPCR) 232, 233 
reactive oxygen species (ROS) 156 
recessive alleles 118, //9, 121-3, 
124 
family pedigrees and 134-7 
mutations and 168-9 
recombinant (genetic) 127 
recombinant (molecular cloning) 
174 
recombination 113, 114 
genetic maps 181, /82 
and linkage 125-9 
recombination frequency 129-31 
red blood cells 57, 58 
red-green colour vision 133 
redundancy 207 
genetic code 163 
regulated secretion 92 
regulatory DNA elements 214-16, 
224, 225, 226, 227 
regulatory regions 170, /7/, 185 
release factor (RF) protein 248 
repair enzymes 151, 156, 159 
replication bubble 149, /50, 151 
replication complex 149, /52 
replication fork 149, /50, 151, 
152, 178 
reporter genes 226 
resolution 34, 67 
response factors 224, 225 
restriction endonucleases 173, 
174, 175, 176 
retroviruses 190, 232 


Index 


reverse transcriptase 190, 232, 233 
reverse transcription 190 
Rho factor 219 
thodopsin 88 
ribonucleases 239-40, 24] 
ribonucleic acid (RNA) 10, 143 
catalytic activity 11 
from DNA transcription 203-5 
primer 151 
structure 162 
ribonucleoside triphosphates 204 
ribonucleotides 203, 204 
ribosomal RNA (rRNA) 835, 86, 
186, 205 
domains and /9 
mitochondrial genome /79 
production 83 
ribosome binding site 244, 245 
ribosomes 10, 20, 68 
assembly 82 
endoplasmic reticulum 87 
mRNA translation 88, 89, 207, 
208 
organisation 86 
polyribosomes 86, 248 
prokaryotic 75, 86 
protein synthesis 85-6 
structure 243 
synthesis 25 
translation initiation 244, 245, 
246 
RNA see messenger RNA 
(mRNA); microRNAs 
(miRNAs); ribonucleic acid 
(RNA); ribosomal RNA (rRNA); 
transfer RNA (tRNA) 
RNA-induced silencing complex 
(RISC) 240, 241, 242 
RNA interference 106 
RNA interference (RNAi) 242 
RNA polymerase 204 
bacterial transcription 211-14 
eukaryotic transcription 222, 
223-4, 227, 228, 229, 236 
lac operon 216-17, 2/8, 219 
RNA polymerase I 222 
RNA polymerase II 222, 223-4, 
227, 237 
RNA polymerase III 222 
rolling circle replication 177, 178 


263 


Generating Diversity 


Romanov family 136, 137 

rough endoplasmic reticulum 
(RER) 68, 87, 88, 89, 90, 248 

royal families of Europe /37 

RT-PCR (reverse transcription 
polymerase chain reaction) 232, 
233 


Ss 


S (synthesis) phase 111 
Saccharomyces cerevisiae 51, 52, 
191 
gene expression 234 
model organism /07, 108 
safranin 36 
satellite DNA 188 
scanning electron micrographs 
bacteria 44 
Dictyostelium discoideum 23 
fungi 5/, 52 
stomata 55 
scanning electron microscopy 
(SEM) 67 
Scenedesmus 49 
secondary antibody 38, 39, 40, 7/ 
secretion, proteins 91-2 
secretory vesicles 90 
sections 35 
sedimentation, differential 73, 74 
sedimentation coefficient ‘S’ 86 
segmental duplication 197-8 
selective pressures 46 
coevolution 52 
protists 47 
sense strand 203, 204 
septa 51 
sex chromosomes 109-10, 131-3 
in human pedigrees 136-8 
see also gametes 
sexual life cycles 114-16, 118 
SI units 31 
sigma (0) factor 212, 213 
signal recognition particle (SRP) 
88 
signal sequence 86 
silencers 225 
silent mutations 166, 167 
simian virus 40 (SV40) 225 
simple sequence repeats 188 


264 


single nucleotide polymorphisms 
(SNPs) 195, 198, 199 
single-strand DNA binding (SSB) 
proteins 149 
skeletal muscle 57, 58 
small interfering RNAs (siRNAs) 
242-3 
small intestine 
cells 59-62, 63 
epithelial cells 80, 91-2 
small nuclear RNAs (snRNAs) 238 
smooth endoplasmic reticulum 
(SER) 68, 87 
smooth muscle cells 58, 59, 61, 62 
contraction 64 
gap junctions 101 
somatic cells 108, 190 
southern blotting 230 
speciation 11 
species 
survival and variation 12 
taxonomy 17, /8 
specific transcription factors 216, 
221, 224-6, 227 
Spirillum minus 44 
Spirillum volutans 44 
spirochaetes 45 
spliceosome 238 
splicing 238-9 
RNA 209 
sponges 22, 23, 28 
spores 22, 23, 115 
sporophyte 115 
start codon 163, 164, 205, 207 
start-transfer signal 88, 89 
stem cells 22, 189 
steroids, production 87 
stoma(ta) 55 
stop codons 163, 164, 167, 205, 
207 
stop-transfer signal 88, 89 
Streptococcus pneumoniae 44, 144, 
182 
stroma 95, 96 
Sturtevant, Alfred 129, 130 
sugar—phosphate backbone 144, 
145, 146, 147, 148, 158 
‘sulfur pearls’ 46 
supercoiling 171, /72 
supernatant 73, 74 


support cells 58 

surface area to volume ratio 47 
symbionts 21 

symbiosis 45, 52 

synapsis 112 

synaptonemal complex 112 
synthetic life 24-5 

Synthia 24 


T 


Takifugu rubripes 107 
Taq DNA polymerase 153-4, 233 
TATA box /85, 222, 223 
taxonomy 17-18 
telomerase 189-90 
telomeres 188-90 
telophase ///, 112, //3, 114 
template strand 203, 204, 213 
termination 207, 210 
eukaryotic transcription 236 
prokaryotic transcription 219, 
220 
translation 248 
terminator sequence 214 
TFIID (transcription factor IID) 
223 
thermodynamic laws 14 
Thermus aquaticus 154 
Thiomargarita namibiensis 45, 46 
thylakoids 95, 96 
thymine (T) 10, /46, 147, 148 
dimers 158, 159, 160 
tight junctions 101 
tissues 28 
cell culture 41 
cell interactions in 99-102 
light microscopy 35-9, 40 
major types in cells 56, 58 
vascular, ground and dermal 
53-5 
see also connective tissue 
tonoplast 96 
totipotency 22 
toxias, plasmids and 172 
traits 12 
transactivation domain 214 
transcription /0, 82, 162, /63, 
202 
DNA into RNA 203-5 


eukaryotic /85, 221-37 
coordinated gene expression 
230-5 
elongation and termination 
236 
initiation coordination 227-8 
post-transcriptional control 
237-43 
transcription machinery 228-9 
vs prokaryotes 221 
main stages 210 
prokaryotic 
initiation 211-14 
termination 219, 220 
reverse 190 
transcription bubble 2/3, 214, 224 
transcription factors 214-16 
constitutive and inducible 224, 
225 
eukaryotic 224-6 
eukaryotic gene regulation 
control 227 
general 223-4, 227 
specific 216, 221, 224-6, 227 
transcription initiation complex 
213, 222-4 
transcription start site 2/3, 214, 
222 
transcription terminator 219, 220 
transcriptional activators 214, 
217-19, 228, 229 
transcriptional repressors 214 
lac 217, 218 
transduction 183 
transfer RNA (tRNA) /79, 205-7, 
208, 243, 244, 245, 246, 247 
transformation 174, 182-3 
transitions 166, 167 
translation /0, 88, 89, 162, 163, 
202, 243-9 
control of gene expression 248-9 
eukaryotes 244, 246 
mRNA into polypeptides 205-7, 
208 
prokaryotes 244, 245 
termination 248 
translation initiation factors 244, 
245, 246 
translocation 247-8 
transmembrane proteins 88, 89 


transmission electron micrograph 
70 
transmission electron microscopy 
(TEM) 68-9 
transport vesicles 90, 9/ 
transporter proteins 60, 85 
transposable elements 187 
see also transposons 
transposons 186, 187-8 
gene duplication 195 
transversions 166, 167 
tree of life 17-18, 19 
Trichophyton interdigitale 51 
triplet codons 164, 166 
tryptophan 206, 219 
tubulin 79, 80, 240 
turgidity 96 


ultracentrifugation 72 

ultraviolet (UV) damage, DNA 
158-60 

unicellular organisms 3 

protists 47-8 

universal genetic code /79 

upstream promoter region 
(proximal promoter) 225 

uracil 157, 158 

uranium 69 

Urey, Harold 6, 8 


Vv 


vacuoles 68, 96-7 
valine 207 
variation 11-12 
genome 194-9 
see also genetic variation; 
phenotypic variation 
vascular tissue 53, 54 
VEGF growth factor 242 
vesicles 75, 90, 91 
Victoria, Queen 136, 137 
video (time-lapse) microscopy 43 
viruses 3 
genomes 190 
herpes simplex virus 240 
HIV-1 (AIDS) virus 236 
influenza 190, 238 
retroviruses 190, 232 


Index 


simian virus 40 (SV40) 225 
transduction 183 
Volvox 49-50 


Ww 


water 
early atmospheric 8 
hydrogen bonding 4 
Watson, James 144, 147 
Watson—Crick structure 144, /45, 
149, 206 
wavelength 34, 67 
white gene 131-2, 168, 169 
whole genome duplication 198 
“wild type’ 119-20, /3/, 132, 168, 
181 
wobble base pairs 206, 207 


x 


X chromosome /30, 131-3 
linked haemophilia /37 

X-ray diffraction 144 

Xenopus laevis 198 

xeroderma pigmentosum 159 

Xgal 226 

xylem 54, 55 


¥i 


Y chromosome 131-3 
yeast see Saccharomyces cerevisiae 


Zz 


Z-DNA 144 
zine finger motif 214, 2/5 
zygote 1/5 


265 


$294 Cell biology 
Science: Level 2 


Oo Book 1 Generating Diversity 
(e) Book 2. Working Cells 


© Book 3 Challenging Cells 


ISBN 978-1-7800-7880-9 


9°781780 


078809 


